Getting Started With Kafka Producers, Consumers, Consumer Groups - Essential Concepts

Introduction

Kafka is a platform for distributed event storage and stream processing. Producers in Kafka are utilized to publish events or data to topics, while Consumers retrieve the data from these topics. This article emphasizes the fundamental concepts for efficiently working with Kafka Producers and Consumers.

Basic Producer Functionality

To enhance your understanding of the concepts, let's start by creating the "com.products" topic in Kafka using the 'kafka-topics.sh' command."

./kafka-topics.sh --create –zookeeper localhost:2181 --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic com.products

The topic has 3 partitions.

Publishing messages with a Key

While Sending messages with key, Kafka distributes data to partitions by applying a hash function to the key. As data is continuously published, the data will be evenly distributed across all.

Let's first produce a message and then examine the details, for producing the message ‘kafka-console-produce.sh’ should be used,

./kafka-console-producer.sh --bootstrap-server localhost:9092 --property "parse.key=true" --property "key.separator=:" --topic com.products 

The "parse.key=true" property instructs the producer to read the key, and the "key.separator" is used to specify the separator character between the key and its corresponding value, in the example the separator is “:”

Let's publish a message with the value "10:IPhone". After the message has been produced, we will examine the Kafka directories for the "com.products" topic. This example is executed on a Mac, and the directory location is "/tmp/kafka-logs/com.products*

ls /tmp/kafka-logs/com.products*

 Getting Started with Kafka Producers, Consumers, Consumer Groups: Essential Concepts

Three directories are created, one for each partition ID (com.products-0, com.products-1, com.products-2) each containing multiple files. Let's investigate these files within each partition.

To view the data, we can run the command 'cat /tmp/kafka-logs/com.products-0', followed by 'cat /tmp/kafka-logs/com.products-1' and 'cat /tmp/kafka-logs/com.products-2'. On the local system, the data was stored in partition 1, which can be seen by running

cat /tmp/kafka-logs/com.products-1

 Getting Started with Kafka Producers, Consumers, Consumer Groups: Essential Concepts

Let's add more messages using the key (10) and examine the '.log' file for partition 1. We will see that the data consistently goes to the same partition.

 Getting Started with Kafka Producers, Consumers, Consumer Groups: Essential Concepts

All the data was routed to partition 1. Kafka distributes data to partitions by applying a hash function to the key that’s why in this case where the Key is 10 the data flows to Partition 1. As more data is published with different keys, the records will be evenly distributed across all three files.

Publishing Messages without a Key

When a message is published without a key (i.e., key=null), it is distributed in a round-robin manner to various partitions and brokers. However, if the message is published with a key, all messages with that same key will consistently go to the same partition.

Consumer / Consumer Group Basics

A Kafka Consumer is a software component that reads data from a Kafka cluster. It subscribes to one or more topics within a Kafka cluster, and retrieves the messages or events produced to those topics

A Consumer Group is a group of consumers that work together to consume messages from a specific topic in a Kafka cluster.

 Getting Started with Kafka Producers, Consumers, Consumer Groups: Essential Concepts

  • In a consumer group, each message is delivered to a single consumer for processing and the consumer then acknowledges receipt back to Kafka.
  • Kafka maintains a record of the number of active consumers for a given topic and distributes the messages evenly among them
  • The number of partitions must be equal to or greater than the number of consumers in a group. If there are more consumers than partitions, some consumers will be idle.
  • Multiple consumer groups can be created, each with a unique set of consumers. Each group will receive a complete copy of all messages, but each message will be delivered to only one consumer within each group, something like

     Getting Started with Kafka Producers, Consumers, Consumer Groups: Essential Concepts
     
  • Kafka handles rebalancing the load by reassigning partitions among active consumers when new consumers join, or existing ones leave.

Consuming Messages

For consuming the messages “kafka-console-consumer.sh” command should be used.

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic com.products --group consumer-group-one --property print.key=true --property key.separator=":" --from-beginning 

Here we have a consumer group named ‘consumer-group-one’ and trying to consume messages,

 Getting Started with Kafka Producers, Consumers, Consumer Groups: Essential Concepts

Since only three messages have been published so far with the same key (10), all of them are consumed by 'consumer-group-one

Let's repeat the same command, but with a different consumer group, "consumer-group-two".

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic com.products --group consumer-group-two --property print.key=true --property key.separator=":" --from-beginning 

It is evident that various consumer groups can consume the messages.

Consumer Offset Management

  • The consumer offset is a metric used to monitor the consumption of messages by each consumer and partition.
  • Kafka assigns a message ID to each message once it is received, and it tracks the message ID offset for each consumer and partition to monitor the consumption.
  • Kafka brokers maintain a record of what has been sent to the consumer and what has been acknowledged by the consumer using two offset values, the Current Offset, and the Committed Offset. The Current Offset tracks the last message that has been sent to a particular consumer and the Committed Offset tracks the last message acknowledged by the consumer
  • When the Kafka brokers do not receive acknowledgment from the consumer within a specified time limit, they will re-send the message to the consumer group, ensuring that each message is delivered at least once to the consumer
  • A message can be sent repeatedly by the Kafka brokers if an acknowledgment is not received within a specified time frame, but each message will be delivered at least once

Conclusion

The article presented an overview of the fundamental concepts of Producers, Consumers, and Consumer Groups in Kafka. It aimed to provide a clear understanding of the basics of Kafka through the use of examples. I hope it was helpful to you.


Similar Articles