Kafka Fundamentals And Architecture

In this article, we will understand what is Kafka and what are the different components in the overall Kafka Ecosystem.
 
To understand Kafka in simple terms we can say that Kafka is a very fast and highly performant fault-tolerant publish and subscribe messaging system.
 
Kafka works with the stream of events and is basically an event streaming platform. When we say the term Event, then it becomes necessary to understand the real meaning behind the Event.
 
So, an Event can be understood as an atomic piece of data, for example when we go to any web site and we register or sign up to that website then that signup is an event that has some data associated with it also, which contains all the information which is really required to sign up.
 
The event can also think upon as a message with some data and Kafka is a platform that works with the stream of events.
 
Now after this it becomes imperative that we understand some of the concepts.
 

Fundamental Concepts

 
There are some fundamentals concepts that are really important to understand before we discuss the Kafka Cluster architecture.
 
Producers
 
Producers write events to Kafka. Producers don’t wait for the acknowledgment from brokers, instead they simply push data to the broker.
 
Consumers
 
Consumers as the name implies,  consume the events that are produced by the producers. It is also possible that the same application can behave both as producer and consumer but they will be producing different topics other than what the application is consuming.
 
But generally, entities like databases or data analytics applications acts as consumers as they need to store the data that has been generated by some other external systems.
 
Kafka is a middle system or layer which sits in the middle of Producers and Consumers.
 
Consumer Group
 
Each consumer group has a unique Id. If there are multiple consumers inside the consumer group then the same message will not be read multiple times by the multiple consumers within the same group.
 
Topics
 
The topic is a channel or a place in Kafka inside which the producers produces messages and from where the Consumers consume the messages. Messages inside the topics are well structured and organized and a specific type of message is produced inside a specific topic only.
 
So in the flow the producer writes the data or message inside a topic and then the consumer consumes the message from that topic. Inside Kafka, the topic name has to be unique.
 
 
Partitions
 
Topics are split into partitions and then these partitions are replicated across the different brokers.
 
Partitions help to replicate data or messages across brokers. Each Kafka topic is divided into different partitions and each partition can be placed on a separate node. Within a partition each message is assigned an incremental id, also called offset.
 
 
With the above figure, there are some key points to learn, 
  • We can see that the replication is happening at the partition level
  • It is not possible to replicate the topic more than the number of brokers available within Kafka
  • Also, we can see that for a particular partition of a topic there can be only one leader.
Replicas
 
As the name implies, there are multiple copies of the same data in the Kafka Cluster and these are called Replicas. Due to this feature, Kafka is very reliable, fault tolerant as if one broker is down then the same data can be served by some different broker.
 
Brokers and Nodes
 
Kafka system is also called as Kafka Cluster as it contains multiple elements, each element inside the cluster is called as Node.
 
Brokers are the software components that runs in the node and data inside Kafka is distributed among several brokers.
 
Kafka Architecture
 
As we have understood the fundamental concepts, so now let’s discuss the architecture behind Kafka.
 
 
Here in the diagram above, we can see ZooKeeper which we have not discussed before.
 
ZooKeeper plays a vital role in Kafka Ecosystem and is being used by Kafka to manage and coordinates the brokers, as brokers are stateless so Zookeeper is used to maintain the cluster state.
 
Here we can see that different producers are producing messages to different topics in Kafka and the consumers are pulling the messages as soon as they arrive inside the topics in Kafka.
 
Producers get the broker Id from Zookeeper and once the consumers consume the message then they update the offset, this means that the consumer has consumed all the previous messages.
 
Kafka contains multiple brokers to maintain the load and in case of any failure or if any broker is down then the same request is being served by other brokers within the cluster.
 

SUMMARY

 
In this article, we have discussed about Kafka and its architecture in detail. In the next article, we will be discussing its benefits and use cases.
 
I hope you find this article helpful. Stay tuned for more … Cheers!!