KAFKA for Mere Mortals : Cluster and Brokers With Distribution

Tural Suleymani
Jan 23, 2024

1.2k
0
1
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Introduction

In production, a Kafka cluster usually consists of multiple brokers.

A broker is synonymous with a "Kafka server."

Each broker is, in itself, a server. The purpose of a broker is to serve data.

Kafka

It doesn't matter whether it is physical or not; in the end, a broker should function as a server. While it is technically possible to have only one broker in a cluster, this is usually done only for testing or self-learning purposes.

The purposes of a cluster include

Handling multiple requests in parallel using multiple brokers.
Providing high throughput.
Ensuring scalability.

The brokers within a cluster are not just servers; they are also bootstrap servers. So, what exactly is a bootstrap server?

A bootstrap server is a type of server that possesses all the necessary information about other servers within the "network." This includes information about other brokers, their IP addresses, the topics they contain, and the partitions they encompass. This means that when you connect to one broker, you are automatically connected to the entire cluster. In most cases, a good starting point is to have three brokers. However, in high-loaded systems, you might end up with hundreds of brokers.

Depending on the topic's configuration, a broker, as a storage type, consists of multiple partitions. When we create a topic, we define the number of partitions under that topic. Kafka, as a distributed system, employs the best algorithm to distribute these partitions among brokers. Here are some examples

Let's consider a Kafka cluster with three brokers. When creating a topic named "tracking_accounts" with three partitions, Kafka will attempt to distribute the partitions among the three brokers. In the best scenario, this will result in one partition per broker. Of course, this depends on various factors, including load balancing. You don't need to intervene; Kafka, as a "distributed framework," automatically manages all these internal operations.

Now, let's explore scenarios involving different numbers of partitions and brokers

If you have three partitions and four brokers, Kafka will attempt to distribute them, assigning one partition to each broker, leaving one broker without a partition. But why create more brokers than the partition count? The value becomes apparent when you encounter issues with a broker going down. As we know, one of the most important attributes of Kafka is its fault tolerance. When a broker fails, Kafka automatically recovers using other brokers.

Note. The fourth broker should contain at least one replica of the partition. We will delve into the use of replicas to ensure data availability in case of broker failures in future articles.

Kafka

Consider another scenario with eight partitions and five brokers. Again, Kafka will distribute the partitions, assigning one partition per broker. However, three brokers will end up with two partitions each. In general, the distribution might look like this

Broker 1 with 1 partition
Broker 2 with 1 partition
Broker 3 with 2 partitions
Broker 4 with 2 partitions
Broker 5 with 2 partitions

The order might differ, but generally, the internal distribution mechanism strives to distribute partitions effectively.

The other most important question is how Producers and Consumers know which broker to communicate with for reading and writing data.

The answer is simple. Since Kafka brokers also act as bootstrap servers, they possess all the essential information about other servers.

For example, consider any Producer. Before producing data, the Producer sends a background request to any broker (it doesn't matter which one, even the nearest broker will do) to retrieve metadata information. This metadata contains all the relevant details about other brokers, their topics, partitions, and leader partitions (which we will cover in future discussions).

Kafka

Using this metadata, the Producer knows which broker to send data to. We call this process "KAFKA broker discovery".

Recommended Free Ebook

Printing in C# Made Easy

Download Now!