Apache Kafka Confluent cluster Dockerization (Part-3)

Ankur Garg

4 min readAug 1, 2020

You can find the Part-1 and Part-2 of this article here:

Apache Kafka Confluent cluster Dockerization (Part-1)

What is Apache Confluent? The Confluent Platform is a stream data platform that enables you to organize and manage the…

medium.com

Apache Kafka Confluent cluster Dockerization (Part-2)

You can find the Part-1 of this article here:

medium.com

Now, Kafka Rest proxy:

The official confluent document says The Confluent REST Proxy provides a RESTful interface to a Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.

Let’s Setup Kafka Rest proxy in docker on server1 (10.0.0.101)
docker-compose.yml looks like

---
version: '3.2'
services:
  kafka-rest:
    image: <image kafka-rest>
    container_name: kafka-rest
    ports:
      - 8082:8082
    environment:
      KAFKAREST_HEAP_OPTS: "-Xmx256M"
      KAFKA_REST_ID: kafka-rest-server
      KAFKA_REST_LISTENERS: http://0.0.0.0:8082
      KAFKA_REST_SCHEMA_REGISTRY_URL: http://10.0.0.101:8081
      KAFKA_REST_BOOTSTRAP_SERVERS: PLAINTEXT://10.0.0.101:9092,PLAINTEXT://10.0.0.102:9092,PLAINTEXT://10.0.0.103:9092

Properties used:

KAFKA_REST_ID: This is a unique ID for the Confluent REST Proxy server instance.KAFKA_REST_LISTENERS: Comma-separated list of listeners that listen for API requests over either HTTP or HTTP.KAFKA_REST_SCHEMA_REGISTRY_URL: The base URL for Schema Registry that should be used.KAFKA_REST_BOOTSTRAP_SERVERS: A list of Kafka brokers to connect to. This configuration is particularly important when Kafka security is enabled, because Kafka may expose multiple endpoints that all will be stored in ZooKeeper, but REST Proxy may need to be configured with just one of those endpoints.

Next, we have Kafka connect:

Kafka Connect, an open-source component of Apache Kafka, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems. It uses meaningful data abstractions to pull or push data to Kafka. Kafka Connect can be deployed either as a standalone process that runs jobs on a single machine or as a distributed, scalable, fault-tolerant service.
Standalone mode is useful for development and testing Kafka Connect on a local machine. It can also be used for environments that typically use single agents (for example, sending web server logs to Kafka).
Distributed mode runs Connect workers on multiple machines (nodes). These form a Connect cluster. Kafka Connect distributes running connectors across the cluster. You can add more nodes or remove nodes as your needs evolve.
Here, we are going to set up two nodes distributed Kafka Connect cluster. Nodes using the same groupID are part of a cluster.

Let’s set up a Kafka Connect docker cluster:
Similar to KSQL, this doesn’t work on a quorum-based approach so, in this case, we can have an even number of nodes also. Here We are setting up 2 nodes Kafka Connect cluster.
Let’s 2 servers be
server1 (10.0.0.101)
server2 (10.0.0.102)

Server1 (10.0.0.101) docker-compose file should be like this:

---
version: '3.2'
services:
  kafka-connect:
    image: <kafka connect image>
    container_name: kafka-connect
    ports:
      - 8083:8083
    environment:
      KAFKA_HEAP_OPTS: "-Xms256M -Xmx1024M"
      CONNECT_BOOTSTRAP_SERVERS: 10.0.0.101:9092,10.0.0.102:9092,10.0.0.101:9092
      CONNECT_GROUP_ID: connect-cluster
      CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL:     http://10.0.0.101:8081
      CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://10.0.0.101:8081
      CONNECT_CONFIG_STORAGE_TOPIC: connect-configs
      CONNECT_OFFSET_STORAGE_TOPIC: connect-offsets
      CONNECT_STATUS_STORAGE_TOPIC: connect-statuses
      CONNECT_PLUGIN_PATH: /usr/share/java

Properties:

CONNECT_BOOTSTRAP_SERVERS: A list of Kafka broker to use for establishing the initial connection to the Kafka cluster. The client uses all servers regardless of which servers are specified for bootstrapping. The list only impacts the initial hosts used to discover the full set of servers.CONNECT_GROUP_ID: A unique string that identifies the Connect cluster group this worker belongs to. This property is important when setting up a distributed cluster. All nodes of a cluster should use the same group id.CONNECT_KEY_CONVERTER: Converter class for key Connect data. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors.CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: This property provides the Schema Registry URL, this property is needed when key_converter is specified.CONNECT_VALUE_CONVERTER: Converter class for value Connect data. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors.CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: This property provides the Schema Registry URL, this is needed when value_converter is specified.CONNECT_CONFIG_STORAGE_TOPIC: The name of the topic where connector and task configuration data are stored. This must be the same for all Workers with the same groupID.CONNECT_OFFSET_STORAGE_TOPIC: The name of the topic where connector and task configuration offsets are stored. This must be the same for all Workers with the same groupID.CONNECT_STATUS_STORAGE_TOPIC: The name of the topic where connector and task configuration status updates are stored. This must be the same for all Workers with the same groupID.CONNECT_PLUGIN_PATH: The comma-separated list of paths to directories that contain Kafka Connect plugins.

docker-compose file on server2 (10.0.0.102):

---
version: '3.2'
services:
  kafka-connect:
    image: <kafka connect image>
    container_name: kafka-connect
    ports:
      - 8083:8083
    environment:
      KAFKA_HEAP_OPTS: "-Xms256M -Xmx1024M"
      CONNECT_BOOTSTRAP_SERVERS: 10.0.0.101:9092,10.0.0.102:9092,10.0.0.101:9092
      CONNECT_SEND_BUFFER_BYTES: 52428800
      CONNECT_RECEIVE_BUFFER_BYTES: 52428800
      CONNECT_GROUP_ID: connect-cluster
      CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: http://10.0.0.101:8081
      CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://10.0.0.101:8081
      CONNECT_CONFIG_STORAGE_TOPIC: connect-configs
      CONNECT_OFFSET_STORAGE_TOPIC: connect-offsets
      CONNECT_STATUS_STORAGE_TOPIC: connect-statuses
      CONNECT_PLUGIN_PATH: /usr/share/java

You can find the remaining parts of this article here:

Apache Kafka Confluent cluster Dockerization (Part-1)

What is Apache Confluent? The Confluent Platform is a stream data platform that enables you to organize and manage the…

medium.com

Apache Kafka Confluent cluster Dockerization (Part-2)

You can find the Part-1 of this article here:

medium.com

References:

https://docs.confluent.io/current/kafka-rest/config.htmlhttps://docs.confluent.io/current/connect/references/allconfigs.html

Until next time…

Apache Kafka Confluent cluster Dockerization (Part-3)

Apache Kafka Confluent cluster Dockerization (Part-1)

What is Apache Confluent? The Confluent Platform is a stream data platform that enables you to organize and manage the…

Apache Kafka Confluent cluster Dockerization (Part-2)

You can find the Part-1 of this article here:

Apache Kafka Confluent cluster Dockerization (Part-1)

What is Apache Confluent? The Confluent Platform is a stream data platform that enables you to organize and manage the…

Apache Kafka Confluent cluster Dockerization (Part-2)

You can find the Part-1 of this article here:

Written by Ankur Garg