What Is A Column Store Database

Column Store Databases store the data in columns rather than rows. In this article, let's understand what column databases are and what are the most popular column store DBMS.

What Are Columnar Databases

 
A column data store is also known as a column-oriented DBMS or columnar database management system. Column store DBMS store data in columns rather than rows. Relational database management systems (RDBMS) store data in rows and data properties as column headers. Both row-based and column-based DBMS use SQL as their query language but column-oriented DBMS may offer better performance. Imagine you need to list all names from a table based on an ID; rather than going through all the rows, you could just access a single column of the table.
 
Here are some of the key characteristics of column data store DBMS.
  1. Column store DBMS use a keyspace that is like a database schema in RDBMS.
  2. Column store DBMS have a concept called a column family. A column family is like a table on RDBMS. The keyspace contains all the column families in a database.
  3. A column family contains multiple rows. Each row has a unique key called Row Key, which is a unique identifier for that row.
  4. Each column in column store databases has a Name, Value, and TimeStamp fields.
  5. Each row can contain a different number of columns. All rows don’t have to have the same columns.
  6. Each column can contain multiple rows. All rows don’t have to have the same data type or size.

Keyspace

 
Column store DBMS uses a keyspace that is like a database schema in RDBMS. The keyspace contains all the column families. A keyspace name can be a CMS database that stores user profiles, documents, and documents metadata.
 
Column Store Database Keyspace
 

Column Family

 
A column family is like a table in RDBMS. A keyspace can have multiple column families. For example, a keyspace can have column families AuthorProfile, MemberProfile, Article, Blog, and Question.
 

Row Key

 
A column family contains multiple rows. Here is an example of a row of a column family. The first item in a row is a single value row key that is a unique key to identify a row. 
 
Column Store Database Row Key
 

Column

 
Each row can have multiple columns. A column in a column data store contains actual values. The data in a column store database is stored in a key/value pair with a timestamp. Each row can have a different number of columns.
 

AuthorProfile

 
The following is an AuthorProfile column family with three rows. Each row has a different number and types of columns. As you can see from the data, each row has a unique Row key.
 
Column Store Database Column
 
Three rows of the column family are Mahesh, David, and Allen. The first row has three columns, Gender, Expertise, and Rank. The second row has two columns, Gender, and Book. The third row has three columns, City, Book, and Rank.
 

Key benefits of column store databases

 
Key benefits of column store databases include faster performance in load, search, and aggregate functions. Column store databases are scalable and can read billions of records in seconds. Column store databases are also efficient in data compression and partitioning.
 

Popular Column databases

 
Some of the popular column-oriented DBMS include Bigtable, Cassandra, HBase, Druid, Hypertable, MariaDB, Azure SQL Data Warehouse, Google BigQuery, IBM Db2, MemSQL, SQL Server, and SAP HANA.
 

Bigtable

 
Google Bigtable is a petabyte-scale, fully managed NoSQL database service for large analytical and operational workloads.
 
Key features include:
  • Low latency, massively scalable NoSQL
  • Consistent sub-10ms latency
  • Replication provides higher availability, higher durability, and resilience in the face of zonal failures
  • Ideal for Ad Tech, Fintech, and IoT
  • Storage engine for machine learning applications
  • Easy integration with open source big data tools

Cassandra

 
Apache Cassandra NoSQL database is a highly scalable and highly available without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.
 

HBase

 
Apache HBase is an open-source, distributed, scalable, NoSQL big data store that allows billions of rows of big data access in seconds.
 
Key features include:
  • Linear and modular scalability
  • Strictly consistent reads and writes
  • Automatic and configurable sharding of tables
  • Automatic failover support between RegionServers
  • Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables
  • Easy to use Java API for client access
  • Block cache and Bloom Filters for real-time queries
  • Query predicate push down via server-side Filters
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell

References

 
https://en.wikipedia.org/wiki/Column-oriented_DBMS
https://database.guide/what-is-a-column-store-database/
 
Next Article >> What are NoSQL Databases
To learn more about these popular graph databases, visit Most Popular Graph Databases.