Stream Data From Console To Console Using Apache Kafka

This article explains the steps which are required to stream data from a source(say console window 1) to a sink system (say console window 2) using Apache Kafka. We will not go into the basics of what is Apache Kafka and how it works. There is a lot of content available for the basics of Apache Kafka and how it works. We will keep this discussion the implementation part only.

With the advancement in development technologies, the term ETL is quite common these days. ETL means to Extract, Transform and Load data which is the process of retrieving data from a data source, applying some transformation, and then saved in the destination. For example, you might have some data in the MS SQL server which is being added from a UI. Now we want that whenever the data is added/updated in any table, it should be streamed/copied into another database say Postgres or MySql. In this scenario, we can use this concept.

In this article, we will use the console-based source and sink i.e., when any data is added in one console window, it will be streamed into another console window using Apache Kafka. So let's start with it.

Download the Apache Kafka binaries(kafka_2.13-2.8.0.tgz (asc, sha512)) package from the link and extract the files in a specific location. This zipped content includes template configuration-based files for different source and sink systems like console, database, files, etc in the config folder. Refer to the screenshot below,

Stream Data From Console To Console Using Apache Kafka

This zipped package also contains .batch files which are used to start the zookeeper, Kafka-connector, and our source/sink console windows. These batch files are available in the bin folder and bin/windows folders. Refer to the screenshot below,

Stream Data From Console To Console Using Apache Kafka

Since we are going to use the console windows as source and sink, we will keep the configuration files for them to default configuration which are available as connect-console-source.properties and connect-console-sink.properties in the config folder. Optionally we can change the topic property to any meaningful name. If we update the topic name, it should be updated in both the configuration files as it will act as the central store while streaming data from console window 1 to window 2. We update the topics property to connect-sample in both the configuration files.

Stream Data From Console To Console Using Apache Kafka

Next, we open a command prompt and execute the commands below to navigate to our Kafka directory and then start the Kafka zookeeper on a local windows machine.

cd E:\kafka_streaming
.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

This will start the kafka zookeeper. Refer to the screenshot below,

Stream Data From Console To Console Using Apache Kafka

Next, open another command prompt window and execute the below commands to navigate to the Kafka directory and start the Kafka-connector. 

cd E:\kafka_streaming
.\bin\windows\kafka-server-start.bat .\config\server.properties

This will start the Kafka connector. Refer to the screenshot below,

Stream Data From Console To Console Using Apache Kafka

Next, we open the third command prompt window and execute the following commands to create the Kafka topic. Kafka topic will receive the data from the source console window and send it to the sink console window. The name of the topic should be the same as what we specified in the source/sink configuration files in our discussion above.

cd E:\kafka_streaming\bin\windows
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic connect-test

This will create a new topic with the name connect-test. Refer to the screenshot below,

Stream Data From Console To Console Using Apache Kafka

Now, our setup is complete and we just need to start the source/producer and sink/consumer console windows. For starting the source console window, we will execute the commands below,

cd E:\kafka_streaming\bin\windows
kafka-console-producer.bat --broker-list localhost:9092 --topic connect-test

This will start our source console window pointing to the topic connect-test as we specified it in the above command,

Stream Data From Console To Console Using Apache Kafka

Similarly, for the sink console window, we will execute the commands below. The sink will also start listening for data from the topic named connect-test from the beginning.

cd E:\kafka_streaming\bin\windows
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic connect-test -from-beginning

Stream Data From Console To Console Using Apache Kafka 

Now, type anything in the source console window. The data will be displayed in the sink console window as below.

Stream Data From Console To Console Using Apache Kafka

So this is the very basic way of streaming data from source to sink. The source and sink could be any file or database and depending on the type of source/sink, we have to make the changes in the configuration files accordingly.