No Likes And Range Queries In Cassandra

Sameer Shukla
1y
8.8k
0
2

Article

Introduction

The article explains why Cassandra doesn’t support ‘like’ and no range queries. Although both are extremely important from a Data querying perspective. There are a couple of reasons behind it in my opinion, the first is the architecture of Cassandra and another reason is the data partitioning, let’s explore.

Architecture

Consider a production environment with 5 nodes all are distributed diagram below.

nodesPreview

A User comes in a fire a Like query, now behind the scenes Cassandra will try to fetch the data from all the nodes, and by default, the response time to any query in Cassandra is 10 seconds. We are using Cassandra meaning we are dealing with huge data; this query is not performant at all, and it will impose the back pressure to the nodes and it can bring any node down as well.

Data Partitioning

In Cassandra by default, Murmur3 Partitioner is used for generating tokens, tokens here mean the Hash Value. Murmur3 Partitioner is the most accurate partitioner, and it ensures the tokens are distributed uniformly across all the nodes.

These hash values are nothing but the Partitioning Key, Cassandra is different from SQL because in Cassandra only the Partitioning key should be used in the where clause, and how can we perform ‘like’ on the Hash Values, that’s why the ‘like’ keyword is not present in Cassandra.

Remember the CQL (Cassandra Query Language) is extremely fast because of the Partitioning key query, it’s like HashMap keys.

Another reason is if supported the ‘like’ queries should be performed on all the columns, but Cassandra is a NoSQL and Only Partitioning and Clustering key will be the WHERE clause candidates, that’s one more reason to not have ‘like’ in Cassandra.

No Range Queries in Cassandra

Another flip side of using tokens is, we just cannot perform range queries in Cassandra because tokens are distributed in an Unordered manner plus it’s a token so we cannot perform queries,

SELECT name FROM Employees WHERE empId > 30

empId here is a token, for us, it’s a number, but in Cassandra is a Hashed Value.

In my opinion, Cassandra is a fantastic Database, extremely good in performance, and writes are free here, but before designing the tables we should know our queries beforehand for better performance.

I hope you like the article.