Understanding CAP Theorem

Introduction

The article explains the CAP Theorem in the distributed systems, the CAP theorem recommends that a system cannot have more than two of the three properties,

  1. Consistency
  2. Availability
  3. Partitioning

In the article, we will understand,

  • What CAP Theorem is?
  • Understanding each property in detail.

Let’s explore,

What is CAP Theorem

The CAP Theorem illustrates that a distributed system at the same time cannot be consistent, available, and partition tolerant. In distributed systems to improve performance the applications must rely on Horizontal Scaling, Horizontal scaling means adding more machines to the pool.

A good example of Horizontal scaling is Cassandra, if today in the architecture we have 5 different Cassandra nodes, and the volume of the requests increases, to support the volume we can increase the number of nodes from 5 to 10 and we don’t get caught into resource deficit.

The performance benefit what Horizontal scaling provides comes at cost-complexity, it's expensive to increase the number of machines as compared to adding the number of resources.

Distributed systems come with a different set of problems and design considerations, as we can see in the Cassandra architecture diagram. Although we have increased the number of nodes, data will be distributed across the nodes and the modeling is different as the data will be pulled based on Partition keys, the architecture is always available as now there is no single point of failure. Achieving further auto-scaling is easy but achieving consistency in the same architecture can cause issues. That’s what CAP Theorem recommends having 2 out of 3 properties, let's explore these 3 properties in detail.

CAP

Consistency

Consistency means, all the nodes return the same data at the same time. Whenever a read operation is performed on a system that is consistent, the system will return the most recent write operation. In the above picture, node 4 receives the write operation at 4:01 and node 5 at 4:03, and when the read operation is received by Node 2 at the latest timestamp it returns the latest value.

Availability

In the diagram, Node 5 is down but other nodes are operational as normal. Availability comes by default with any purely distributed system in my opinion. It’s clear, Availability in a distributed system means that system will be up and running irrespective of the individual state of any node. The downside of this approach is, the system may or may not return the recent write but every request gets a response no matter what.

Partition Tolerance

Whenever a node receives a message or event, in that case, there will be sometimes taken for the data to get synched with other nodes in the system. Partition tolerance illustrates that system will continue to function in the case of a message/event being delayed or dropped by the nodes. Imagine a system handling Terabytes of data, in that case, Partition tolerance is more of a necessity, if any node is gone down in the system or the synching is delayed, eventually, it will be corrected but the System will continue to function.

Summary

Distributed systems naturally provide us the higher performance, lower latency, and almost 100% up-time, which happens because of Availability and Partition tolerance. When I say higher performance, it’s because of the querying based on Partitions, it’s fast to locate the partition and querying. CAP has been used by many NoSQL’s like Cassandra, MongoDB, it’s difficult to provide Scalability and ACID consistency all at the same time, although in my opinion Partitioning complements Availability.