What is a Graph Database

Big data, semantic searches, and real-time responses are the reason behind the growing demand for graph databases. This article talks about what a graph database is, why graph databases are popular, and why and when we should use a graph database.

Introduction 

 
Graph theory in mathematics is study of graphs. Graphs in mathematics can be used to represent a solution of a problem. A graph consists of dots called vertices (V) and lines connecting dots are called edges (E). Two connecting vertices by an edge are adjacent.
 
By mathematical definition: 
A graph is an ordered pair G=(V,E) consisting of a nonempty set V (called vertices) and a set E (called edges) of two-element subset of V.
 
Graphs are drawings with numbered vertices connected by edges. Here is an example of a graph that has 6 vertices and 7 edges.
 
G = (V, E) = ({a, b, c, e, e, f}, {{a, b}, {b, c}, {a, d}, {c, d}, {d, e}, {c, f}, {e, f} })
 

Graph databases

 
Graph databases are developed based on graph theory. In graph databases, a graph can be used to represent data entities, their attributes, and relationships. The vertices of a graph database are called nodes and edges are called edges.
 
What is a graph database
 
A graph database is a set of nodes, edges, and properties. Nodes in a graph database represents entities. Nodes in a graph database are equivalent to a table in a relational database. A node in a graph database is equivalent to a record (a row) in a relational database. Edges in a graph database represent relationships between two nodes and equivalent to relationships in a relational database. Edges in a graph database can be directed or undirected. Direction on edges has a different meaning than undirected edges. In some graph databases, edges can have numerical values also known as weight. Properties in a graph database represents data that can be applied to nodes and edges. In a graph database, both nodes and the edges can store data.
 
Graph Database 
 
The above diagram is a graphical representation of a graph database with three nodes, Bob, Alice, and James. The entity type of the nodes is Person. The properties of Person entity include Name, Sex, Born, College, and Employer. The connecting lines (edges) represent the relationship between two nodes.
 

Graph database query languages

 
Graph databases are NoSQL databases. That means, unlike relational databases, graph databases do not use SQL as their query language. The data storage format of graph databases can use different formats such as a table, document, or a key-value pair. Graph databases are based on semantic queries and allows fast retrieval of data by design.
 
There is no single universal query language for all graph databases. Each graph database provides its own query language or API or library.
 
A few common more widely accepted and used query languages are GraphQL, AQL, Gremlin, SPARQL, and Cypher.
  • GraphQL is Facebook query language for any backend service provide by Facebook.
  • AQL (ArangoDB Query Language) is an SQL like query language used in ArangoDB databases.
  • Cypher Query Language a graph query declarative language for Neo4j databases.
  • Gremlin is a graph programming language that works over various graph database systems; part of Apache TinkerPop open-source project.
  • SPARQL is a query language for RDF databases, can retrieve and manipulate data stored in Resource Description Framework format.
  • Microsoft, Facebook, LinkedIn, Twitter and several other large corporations expose their data via APIs and their own languages. The following diagram represents a Microsoft graph database model used in Office 365.
Microsoft Graph exposes data access via REST APIs and client libraries for third party applications to consume data.
 
Microsoft Graph 
 
Learn more here how Microsoft graph works.
 
 

Advantages of graph databases

 
Graph databases have been popular in recent years. Facebook, Microsoft, Twitter, Google, Oracle, SAP and several large corporations have actively implemented and are using graph databases. The key reasons of the popularity of graph databases is the sematic nature of queries, its real-time responses, and meaningful entities storage for large amount of data.
 
The key advantages of using graph databases over traditional relational databases are:
 
Real world representation of data
 
Graph database provides a real-world representation of data. Graph database can easily be visualized using graph nodes, relationships, and data associated with them. Each node in a database is an entity with its properties and supports relationships via edges. A node can have many-to-many relationship. For example, a Person node can connect to a Home, Car, Job, Family, and Friends. This representation makes it easier to conceptualize in real-world applications. Companies including Facebook, LinkedIn, Uber, Google, and Microsoft are using graph databases to represent people with their life activities, connections, and work.
 
Semantic nature
 
Graph databases are semantic in nature by design. The objects, their relationships, and the semantic data associated with the objects and relationships makes graph databases natural and meaningful. Unlike relational databases where relationships and meanings are captured and developed by executing costly operations such as indexing, joins, and other operations, graph databases store data in a meaningful form and can easily be interested to the real world.
 
Scalability
 
With the increased amount of big data, data analytics, and real-time need, relational databases are impossible to scale without a penalty. Graph databases by design scale naturally to large datasets. There are no costly joins or indexing on tables.
 
Performance
 
Unlike a relational database, graph database allows deep traversal on nodes much faster, regardless of the amount of data. Graph databases also do not search entire tables or irrelevant data, but the query is localized to a portion of the graph. It makes it a good candidate for real-time feeds, data analytics, and live updates.
 
Flexibility
 
Graph databases allows flexible schemas to be added and removed from a graph. Each node can have its own meta data and now worried about database constraints. Graphs can also represent multiple dimensions.
 

Popular graph databases

 
The most popular graph databases are Neo4j, OrientDB, and AangoDB. Microsoft Azure CosmosDB also support graph model. Here is a list of top graph databases. Here is a list of top 10 graph databases. 
  1. Neo4j
  2. Microsoft Azure CosmosDB
  3. OrientDB
  4. ArangoDB
  5. Virtuoso
  6. JanusGraph
  7. Amazon Napture
  8. GraphDB
  9. Giraph
  10. AllegroGraph
Learn more about these popular graph databases, visit Most Popular Graph Databases.
 

References

 
https://en.wikipedia.org/wiki/Graph_database
http://discrete.openmathbooks.org/dmoi2/sec_gt-intro.html
https://db-engines.com