Kafka For Mere Mortals : Running Multiple Brokers

Introduction

We have already learned about Kafka's essential architecture that consists of Producer, Consumer, Cluster, Broker, Topic, Partition, and their configurations.

In production, we configure Apache Kafka to have multiple brokers in the Cluster. It is actually possible to have one broker in Cluster, but there is no value, and we mostly use such type of configuration for testing and learning purposes.

Depending on multiple factors, we may have 3 or 30 brokers in our Cluster. Here are the reasons why we should consider multiple brokers.

1. High Availability and Fault Tolerance

With a single broker, if it goes down, our entire Kafka cluster becomes unavailable. This can be a huge risk for critical applications.

By having multiple brokers, we can replicate data across them. This ensures that if one broker fails, the data is still available on other brokers, and your applications can continue to operate with minimal disruption.

2. Scalability and Performance

As our data volume and processing needs grow, a single broker can become overwhelmed.

Adding more brokers distributes the load across multiple machines, which can significantly improve throughput and reduce latency.

This allows us to handle larger data volumes and higher message rates without impacting performance.

3. Redundancy and Disaster Recovery

Having multiple brokers in different geographic locations can help protect your data from disasters.

If one data center goes down, the data is still available to brokers in other locations.

This can be essential for mission-critical applications that require high uptime and data durability.

4. Flexibility and Manageability

With multiple brokers, we can easily add new brokers or remove old ones as needed.

This allows us to adapt your Kafka cluster to changing requirements without downtime.

Additionally, you can distribute different topics or partitions across different brokers based on their needs, which further improves performance and manageability.

5. Increased Concurrency

With multiple brokers, we can have more producers and consumers concurrently accessing your data.

This can be beneficial for applications that require high levels of parallelism.

The optimal number of brokers for your Kafka cluster will depend on our specific needs and requirements.

We need to carefully consider the replication factor when setting up your cluster. The replication factor determines how many copies of each message are stored on different brokers.

Adding too many brokers can also have negative impacts, such as increased complexity and management overhead.

The above sayings sound interesting, but actually, how do we have a cluster with multiple (at least with more than one) brokers?

We have several ways of configuring it but the easiest one I think is the docker-compose file. What do you need?

  1. Running the docker desktop application
  2. The proper docker-compose.yml file to run
version: "3"

services:
  zookeeper:
    image: bitnami/zookeeper:3.8
    ports:
      - "2181:2181"
    volumes:
      - zookeeper_data:/bitnami
    environment:
      ALLOW_ANONYMOUS_LOGIN: "yes"

  kafka1:
    image: bitnami/kafka:3.6
    ports:
      - "9092:9092"
    volumes:
      - kafka_data1:/bitnami
    environment:
      KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_CFG_LISTENERS: "PLAINTEXT://:9092"  # Use only one listener
      KAFKA_CFG_ADVERTISED_LISTENERS: "PLAINTEXT://kafka1:9092"
    depends_on:
      - zookeeper

  kafka2:
    image: bitnami/kafka:3.6
    ports:
      - "9093:9093"
    volumes:
      - kafka_data2:/bitnami
    environment:
      KAFKA_CFG_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_CFG_LISTENERS: "PLAINTEXT://:9093"  # Use only one listener
      KAFKA_CFG_ADVERTISED_LISTENERS: "PLAINTEXT://kafka2:9093"
    depends_on:
      - zookeeper
      
  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    ports:
      - 9100:8080
    environment:
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka1:9092
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181
      KAFKA_CLUSTERS_0_JMXPORT: 9997
    depends_on:
      - kafka1
      - kafka2

volumes:
  zookeeper_data:
    driver: local

  kafka_data1:
    driver: local

  kafka_data2:
    driver: local

Copy the above yaml content and store it as a docker-compose.yml file anywhere you want.

After that, just run, and that is all.

docker-compose up -d

You should have the following result.

Kafka UI

Now we have 2 brokers.

Kafka Brokers

Let's dive into the details of the docker-compose file we have.

In general, docker-compose is a composition of multiple services. Every service is an application for us. We’re able to configure and run multiple services together to make them act as a huge/complex service.

Our docker-compose file consists of the following applications.

  1. First kafka broker( kafka1)
  2. Second kafka broker( kafka2)
  3. Kafka UI – for having a better interface than using Apache Kafka CLI
  4. Zookeeper- for managing brokers

We have 2 different ports per Kafka instance. Kafka1 will use localhost:9092, and Kafka2 will use localhost:9093. Naming the services depends on you, and you can use any names you want. They both have a dependency on zookeepers.

Instead of using Kafka CLI, our general point here is to provide everyone ( not just DevOps/developers) to feel Kafka, so we’re using Kafka-UI. You can use any UI for Apache Kafka. My favorites are Kafka-UI and Offset Explorer.

Let's try to create a topic using Kafka UI with 6 partitions and see how Kafka distributes partitions across the brokers. Don't forget to put 2 as a replication factor.

Create

The final view of partition distribution across the brokers.

Apache Kafka

Conclusion

Having multiple brokers in an Apache Kafka cluster provides significant benefits in terms of high availability, scalability, performance, and manageability. By carefully considering your needs and requirements, you can design a Kafka cluster that meets your specific needs.


Similar Articles