Apache Kafka Confluent cluster Dockerization (Part-2)

You can find earlier part of this article here:

Now, Kafka Broker

The most common definition of Apache Kafka says it is a distributed streaming platform developed by the Apache Software Foundation, It was originally developed by LinkedIn, and was subsequently open-sourced in early 2011.
It is a publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another

A streaming platform key capabilities:

- Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
- Store streams of records in a fault-tolerant durable way.
- Process streams of records as they occur.

And distributed streaming platform means its data can be distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. As each partition is replicated across a configurable number of servers, this makes it Fault Tolerant, a highly available and Horizontally scalable system.

Let’s set up Kafka broker docker cluster:
Similar to Zookeeper, We are setting up 3 nodes Kafka cluster.
Let’s 3 servers be
server1 (10.0.0.101)
server2 (10.0.0.102)
server3 (10.0.0.103)

Server1 (10.0.0.101) docker-compose file should be like this:

Explanation of the properties used:

Server2 (10.0.0.102) Docker compose file:

Server3 (10.0.0.103) Docker compose file:

Next is Schema registry,

Schema Registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving Avro schemas. It stores a versioned history of all schemas, provides multiple compatibility settings, and allows the evolution of schemas according to the configured compatibility setting. It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format.

Let’s Setup Schema registry in docker on server1 (10.0.0.101)
docker-compose.yml looks like:

Properties used:

Next, we have KSQLdb cluster:

KSQL is the powerful SQL streaming engine for Apache Kafka. Using SQL statements, you can build powerful stream processing applications. It is an event streaming database purpose-built to help developers create stream processing applications on top of Apache Kafka. KSQL is a distributed service, we can have multiple nodes in a cluster of Kafka and each node will process a portion of the input data from the input topic(s) as well as generate portions of the output data to output topic. KSQL doesn’t work on a quorum-based approach so, in this case, we can have an even number of nodes also, depending on traffic.

Let’s Setup KSQLdb in docker on server1 (10.0.0.101) and server2 (10.0.0.102)

docker-compose file on server1 (10.0.0.101) looks like:

Properties used are:

docker-compose file on server2 (10.0.0.102):

In the next part of this article, we would be setting up Kafka Rest proxy and Kafka connect worker cluster in docker.

You can find the remaining parts of this article here:

References:

Until next time…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store