Most Popular Graph Databases

C# Curator
1y
74k
0
11

Article

Popular Graph Data Query Languages

The growing amount of data, real-time need for data analytics, and semantics are fueling the growth of graph database management systems. Large corporations, including LinkedIn, Facebook, Microsoft, Twitter, Google, Oracle, and SAP, are heavily using graph databases in their social media networks and data analytics.

Graph databases are developed based on graph theory, where a graph is a set of nodes and edges. A node in a graph data model represents an entity. The edges connecting two nodes are a relationship. Semantic data can be attached to nodes and edges called properties.

Graph databases are NoSQL databases and provide high-performance node traversal and data retrieval. To query graph databases, there is no single language. The most popular graph data query languages are GraphQL, AQL, Gremlin, SPARQL, and Cypher.

If you're not familiar with Graph databases, read What is a Graph Database.

The following table lists the top graph databases based on DB-Engine.

Rank	DBMS	Database model
1	Neo4j	Graph
2	Microsoft Azure Cosmos DB	Multi-model
3	OrientDB	Multi-model
4	ArangoDB	Multi-model
5	Virtuoso	Multi-model
6	JanusGraph	Graph
7	Amazon Neptune	Multi-model
8	GraphDB	Multi-model
9	Giraph	Graph
10	AllegroGraph	Multi-model
11	Dgraph	Graph
12	TigerGraph	Graph
13	Stardog	Multi-model
14	Sqrrl	Multi-model
15	Blazegraph	Multi-model
16	Graph Engine	Multi-model
17	InfiniteGraph	Graph
18	FaunaDB	Multi-model
19	FlockDB	Graph
20	InfoGrid	Graph

1. Neo4j

Neo4js is the most popular graph database. Neo4j is an open-source graph database and follows the labeled property graph model. The key elements of the Neo4j database are nodes, relationships, properties, and labels.

Nodes are the main data elements, i.e., a Person node or a Car node. Nodes are connected to other nodes via relationships. Nodes can have one or more properties (i.e., attributes stored as key/value pairs). Nodes have one or more labels that describe their role in the graph.
Relationships connect two nodes. Relationships are directional. Nodes can have multiple, even recursive relationships. Relationships can have one or more properties (i.e., attributes stored as key/value pairs).
Properties are named values where the name (or key) is a string. Properties can be indexed and constrained. Composite indexes can be created from multiple properties.
Labels are used to group nodes into sets. A node may have multiple labels. Labels are indexed to accelerate finding nodes in the graph. Native label indexes are optimized for speed.

Neo4j is supported on Linux, OS X, Solaris, and Windows operating systems. Supported programming languages are .Net, Clojure, Elixir, Go, Groovy, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, and Scala.

Learn more about the Neo4j database here.

2. Microsoft Azure CosmosDB

Microsoft Azure CosmosDB is a NoSQL, globally distributed, horizontally scalable, multi-model database service and supports Document store, Graph DBMS, Key-value store, and Wide column store data models.

Microsoft Azure CosmosDB is hosted in the Azure cloud. Azure CosmosDB supports multi-model APIs in various languages including Java, .NET, Nodejs, Python, Germlin, Go, and Xamarin.

3. OrientDB

OrientDB is a database designed for the modern world. It is a multi-model database. It is the fastest graph database.

Key features of OrientDB are.

OrientDB is lightning-fast and can store up to 120,000 records each second. It traverses parts of or entire trees and graphs of records in milliseconds. The speed is not affected by the database size; large datasets are easily accommodated.
OrientedDB is flexible to support graph database, document database, object-oriented database, and reactive models with key features such as schema-full, schema-less, schema-mix, database encryption, record-level security, SQL support, TinkerPop Gremlin language, ACID transactions, relationship traversing, custom data types, embedded documents, and more.
OrientDB is written in Java and supports major programming languages supported by OrientDB are .Net, C, C#, C++, Clojure, Java, JavaScript, Node.js, PHP, Python, Ruby, and Scala.
OrientDB Community is free for commercial use. It comes with an Apache 2 Open Source License. OrientDB eliminates the need for multiple products and multiple licenses to manage your data
OrientDB is an open-source project.

4. ArangoDB

ArangoDB, developed by ArangoDB GmbH, is a multi-model NoSQL database management system that supports graphs, documents, and key/value data models. ArangoDB has its own SQL-like language ArangoDB Query Language (AQL) to access and manipulate data. ArangoDB also supports GraphQL. ArangoDB Community Edition is free and under an open-source license (Apache 2).

The key features of ArangoDB are.

Native Multi-model A native multi-model database from the ground up, supporting key/value, document, and graph models. You can model your data in a very flexible way.
Self-healing Cluster ArangoDB can operate as a distributed & highly scalable database cluster. It runs on Kubernetes & DC/OS, including persistent primitives & easy cluster setup.
ArangoSearc
Natively integrated cross-platform indexing, text-search, and ranking engine for information retrieval, optimized for speed and memory.
ArangoDB Query Language
AQL provides a powerful way to access and combine all data access strategies in ArangoDB.
Full GeoJSON Support Enrich your graph, document, or search queries with geo-locational aspects.
Performance High performance. Here is a performance benchmark comparison to other graph databases.

5. Virtuoso

Virtuoso Universal Server is a secure, cross-platform, and high-performance data Server that uniquely delivers data access, data integration, multi-modal data management, and HTTP application deployment services. Virtuoso is a multi-model DBMS and supports Graph DBMS, Native XML DBMS, Relational DBMS, and RDF store data models.

Virtuoso, initially launched in 1998 by OpenLink Software, is written in C language and is an open-source database that supports .NET, C, C#, C++, Java, JavaScript, Perl, PHP, Python, Ruby, and Visual Basic.

Key features of Virtuoso are.

Policy-based security is enforced by the Virtuoso SQL compiler by inserting extra conditions into statements, depending on which user is preparing the statement. The tables or views can themselves be readable to a large group of users but compartmentalization is achieved by the database automatically adding extra conditions.
Two-Phase Commit (2PC) protocol may be used to guarantee ACID properties of Distributed Transactions that change data in more than one database.
Ability to derive relations (entity relationship types) in a variety of forms.
HTTP-Compliant Application Server
Industry Standard SPARQL Query Language Support
Virtuoso provides descriptor resources for every entity (data object) in the Native or Virtual Quad Stores and supports a broad array of output formats, including HTML+RDFa, RDF/XML, N3/Turtle, N-Triples, RDF-JSON, OData+Atom, and OData+JSON.
Virtuoso Meta Schema Language enables the construction of RDF-based Linked Data Views (or Semantic Covers) over SQL, XML, SOA, and REST data sources.
RDF Data Sets are managed by a dedicated module within the Virtuoso ORDBMS core. This functionality is exposed to client applications through implementations of the SPARQL Query Language and Protocol, plus a collection of Web Services and Virtuoso/PL-based APIs for Creating, Updating, and Deleting RDF Data Sets.
The Virtuoso Sponger is the Linked Data middleware component of Virtuoso. It generates Linked Data (in the form of RDF) from a variety of data sources and supports a wide range of data representation and serialization formats.
Linked Data Views over External Data Sources and Native SQL Data Sources
Virtuoso is a runtime hosting vehicle for web services application logic written in PHP, Java, .NET, Python, Perl, Ruby, and many other popular web scripting environments.
SPARQL Query Language Support complies with the W3C SPARQL 1.1 Standard, providing compatibility with other SPARQL-compliant tools, whether home-grown or third-party.
Supported Data Access Standards include ODBC, JDBC, ADO.NET, OLE DB, and XMLA.
WebDAV-Compliant Content Manager
Virtuoso is a runtime hosting vehicle for web services application logic written in PHP, Java, .NET, Python, Perl, Ruby, and many other popular web scripting environments.
Virtuoso can transform Web Services into RDF Linked Data on the fly. ax

6. JanusGraph

JanusGraph is a graph database. JanusGraph was originally developed and launched in 2017 as Titan by Aurelius and is now free and fully open source under the Apache 2 license and governed by the Linux Foundation.

JanusGraph is written in Java and supports Clojure, Java, and Python languages only.

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

Key features of JanusGraph are.

JanusGraph is highly scalable. Supports elastic and linear scalability for growing data and user base, data distribution and replication for performance and fault tolerance, and multi-datacenter high availability and hot backups.
JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. Support for ACID and eventual consistency.
Supports various data sources, and graph data can be stored in Apache Cassandra, Apache HBase, Google Cloud Bigtable, and Oracle BerkeleyDB.
Advanced search capabilities like full-text search can optionally be supported via Elastic Search, Apache Solr, and Apache Lucene.
Integration with Apache Spark, Apache Hadoop, Apache Giraph, and TinkerPop
JanusGraph supports a variety of visualization tools like Arcade Analytics, Cytoscape, Gephi plugin for Apache TinkerPop, Graphexp, Key Lines by Cambridge Intelligence, Linkurious, and Tom Sawyer Perspectives.

7. Amazon Neptune

Amazon Neptune, developed and launched by Amazon in 2017, is a fast, reliable graph database built for the cloud. Amazon is a schema-free database and supports C#, Go, Java, JavaScript, PHP, Python, Ruby, and Scala languages.

The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Amazon Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones. Neptune is secure with support for HTTPS-encrypted client connections and encryption at rest. Neptune is fully managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

8. GraphDB

GraphDB, developed and launched by Ontotext in 2002, is a highly-efficient, robust, and scalable RDF database with efficient reasoning, cluster, and external index synchronization support.

GraphDB, written in Java, uses SPARQL as its query language and supports .Net, C#, Clojure, Java, Node.js, PHP, Python, Ruby, and Scala languages.

9. Giraph

Apache Giraph is an iterative graph processing system built for high scalability to deal with big data. Giraph is an open-source project governed by the Apache Foundation. Giraph utilizes Apache Hadoop's MapReduce implementation to process graphs. Giraph is based on a paper published by Google about its own graph processing system called Pregel.

10. AllegroGraph

AllegroGraph, developed by Franz Inc in 2004, is a high-performance, persistent RDF store with additional support for Graph DBMS. It implements document, graph, and RDF store data models. AllegroGraph supports C#, Clojure, Java, Lisp, Perl, Python, Ruby, and Scala languages.

AllegroGraph features include,

AllegroGraph is 100 percent ACID, supporting Transactions: Commit, Rollback, and Checkpointing.
Full and Fast Recoverability
100% Read Concurrency, Near Full Write Concurrency
Online Backups, Point-in-Time Recovery, Replication, Warm Standby
Dynamic and Automatic Indexing – All committed triples are always indexed (7 indices)
Advanced Text Indexing – Text indexing per predicate
SOLR and MongoDB Integration
SPIN support (SPARQL Inferencing Notation). The SPIN API allows you to define a function in terms of a SPARQL query and then call that function in other SPARQL queries. These SPIN functions can appear in FILTERs and can also be used to compute values in an assignment and select expressions.
All Clients based on REST Protocol – Java Sesame, Java Jena, Python, Clojure, Perl, Ruby, Scala, and Lisp clients
Completely multi-processing based (SMP) – Automatic Resource Management for all processors and disks and optimized memory use. See the performance tuning guide here and the server configuration guide here
Column-based compression of indices – reduced paging, better performance
Triple Level Security with Security Filters
Cloud-Hosted AllegroGraph - Amazon EC2
The AllegroGraph RDF server can be scripted using the JavaScript API
JavaScript-based interface (JIG) for general graph traversal
Soundex support - Allows Free text indexing based on phonetic pronunciation
User-defined Indices - fully controllable by the system administrator
Client-Server GRUFF with Graphical Query Builder
Plug-in Interface for Text Indexers (use SOLR/Lucene, Native AG Full Text Indexer, Japanese Tokenizer)
Dedicated and Public Sessions – In dedicated sessions, users can work with their own rule sets against the same database

References

https://db-engines.com/
Respective product websites

Next -What are the most popular databases in the world