Need Of Polyglot Persistence

Currently data is being stored the way it would be best for reporting. So a RDBMS data store, with index keys and neatly laid out data with relationships that can be easily queries has become the industry standard.

Polyglot persistence means using data storage technology based on the way data is being used by individual applications or components of a single application. So for ex: apart from the relational data, data could be stored in form of Key/Value Store, in Columnar Stores, in form of Documents, in form of relationships between multiple items which can be stored as Graph and so on.

So why was polyglot persistence not heard of before? The reason being the different data persistence technologies matured in last few years and have some very successful implementations at a large scale. Also the usage of data now is not only by applications within one's own enterprise but data needs to be shared between services that may or may not be contained within a single application. The licensing cost of the likes of Teradata, Oracle and MS SQL is also a major factor in driving these alternatives to become mainstream.

KV Store
: An example of Key Value Store is saving Shopping cart data in this format.



Some of the main players for this kind of storage are Redis, Azure Table, Riak, Memcached, Azure Cache and many more. Hadoop uses this kind of data structure for its data storage option. The NoSQL Stores also follow Key Value Store data storage option.

Columnar Store
:

Apache Cassandra and Apache HBase are the major players in this space.  Here data is store in columnar format and there are systems that can store greater than 2 Billion columns of data. Also some systems support columns that can be sparse as shown in the figure. An example is the BigTable system developed by Google. eBay uses Hbase for its searches and firing approximately 2 million queries per second.

 

This kind of data store can be used for time series data. Row store or column store data in 2D format exists only in theory. In reality, data has to be serialized on the storage hardware into one form or another. Since the most expensive operations involving hard disks are seeks,  related data should be stored in a method to minimize the number of seeks so as to improve performance.

Document Store:

Like Key Value Store, this type of store is Key-Document Store. This type of DB do not require schema. Documents can be heterogeneous and may be organised in collections or databases. MongoDB, Apache CouchDB, Raven DB are the most popular for this type of data store.

Graph Store :

This type of store is applicable where multiple nodes has inter-connections/edges as shown below. The example can be relationships between product purchased and recommendations. Another example can be Person and his colleagues, friends, likes, books purchased, product purchased.

Neo4J is quite popular for Graph Store. OrientDB is another one. TitanDB runs on Hadoop and Apache Cassandra.

Comparison on scalability and complexity of Data Stores

The following graph gives a comparative analysis on scalability vs. complexity of design of data storage and can help when each of this data store can be used.


This does not mean that RDBMS is out of the game.  They will still be relevant wherever transactions needs to be stored. Each of these data stores has their own limitations. Based on the application need, data size and data design, you can decide which data store to use in your application.  The data store decision can also be made based on where the application will be hosted - on premises or in cloud.

Read more articles on Databases: