Kafka ETL Target
Kafka emitter stores data to Kafka cluster.
Target Configuration
Each configuration property available in the Kafka emitter is explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Kafka earlier. Or create one as explained in the topic - Kafka Connection →
Topic Name Type
Topic Name Type could be either Static or Dynamic.
Static: When you select Static, field “Topic Name” is pertained as a text box. Provide a unique topic name to emit data.
Dynamic: With Dynamic, topic name will be populated with a drop down of the field names of incoming dataset. On selecting any of the field name, it will create topics based on the value of the selected field name in the topic name. (which means at runtime it will create topics dynamically using the selected field name values.)
Topic Name
Specify the name of the topic to which you want to emit or send your data.
Each piece of data sent to Kafka will be associated with a specific topic, making it easier to organize and manage data streams within your Kafka cluster.
In case of Dynamic Topic Name Type, at runtime the Kafka Emitter will take the value of the selected field as a topic name for each incoming records.
Partitions
Enter how many partitions you want the Kafka topic to be divided into.
Each partition stores messages in order, like chapters in a book, making it easier to manage and process large amounts of data.
By specifying the number of partitions for a topic, you determine how Kafka distributes and manages the data within that topic.
Replication Factor
Set the replication factor for the topic. It determines how many copies of each partition are stored across different Kafka brokers.
If set to 1, each partition will have only one copy. Increasing the replication factor improves fault tolerance, ensuring that data remains available even if some brokers fail.
For a topic with replication factor N, Advanced Kafka will tolerate up to N-1 failures without losing any messages committed to the log.
Producer Type
Determine how messages are sent to Kafka:
Async (Asynchronous): The producer sends messages in the background without waiting for a response from the Kafka broker. This means that the producer can continue to send messages without being blocked by network latency or broker responsiveness. However, there is a risk of message loss if the producer does not receive confirmation of successful delivery from the broker.
Sync (Synchronous): The producer waits for acknowledgment from the Kafka broker after sending each message. This ensures that messages are successfully delivered to Kafka before proceeding to send the next message. While synchronous sending provides stronger guarantees of message delivery, it can introduce latency and reduce overall throughput, especially when network conditions are poor or brokers are under heavy load.
Output Format
Choose the data type format in which the emitted data will be structured.
Options are:
JSON
Avro
Delimited
Select the format that best suits your use case.
Delimiter
Select the message field separator. This option appears when the output format is set to DELIMITED.
Avro Schema Source
Select the Source of Avro Schema:
Input Text Schema: Manually enter the text schema in the field, Input Schema.
Schema Registry: It allows you to enter a schema registry subject name for Avro schema, in the field, Subject Name.
Auto Generate Schema: Schema generated is automatic based on the previous incoming dataset.
Header Fields
Select the columns to be included as header fields in the Kafka message.
Output Fields
Specify the fields in the message that should be part of the output data.
Message Key
The type of key you want to store in Kafka along with the data. It can be stored in the following types:
Field Value: This option uses the values of specific fields present in the dataset as the message key. The values of these fields are concatenated together using a delimiter (such as “#”) to form the message key. This option is useful when you want to use existing data fields as keys in Kafka.
Field Value Hash: Similar to the “Field Value” option, this option also uses the values of specific fields present in the dataset as the basis for the message key. However, instead of using the raw field values, it calculates a hash value (such as MD5 or SHA-256) of these values concatenated together using a delimiter (such as “#”). This can be useful for hiding or disguising sensitive data while still ensuring a consistent key for each message.
UUID: This option generates a universally unique identifier (UUID) to be used as the message key. UUIDs are unique across time and space, making them suitable for ensuring message uniqueness in Kafka topics. This option is often used when no natural key exists in the dataset or when strict message uniqueness is required.
Static Value: This option allows you to specify a fixed/static value to be used as the message key for all messages. This can be useful for scenarios where all messages should have the same key, such as when partitioning data by a constant value. Provide a static value to be used as a key.
None: No specific key is used for messages. In this case, Kafka will automatically assign a partition for each message based on its own partitioning strategy.
Kafka Partitioner
Determine which partition within a Kafka topic a message should be written to. Partitioning mechanisms are:
Round Robin (Default): Evenly distributes messages across partitions in a round-robin fashion.
It does not consider message content but rather assigns partitions sequentially, cycling through each partition in turn.
This option is simple and ensures a balanced distribution of messages across partitions.
Key Based: In key based partitioning, the hash of a message’s key is used to map messages to partitions.
Example:
Order ID (Key) Hash Value Partition Assigned 123 1 Partition 1 456 0 Partition 0 123 (repeated) 1 Partition 1 Here:
Order ID (Key) is the unique identifier for each order.
Hash Value is a hash function applied to the order ID by Kafka.
Partition Assigned is the partition of the “orders” topic that the message will be placed into based on the hash value.
Therefore, in the given example:
When an order with ID “123” is received, its hash value is calculated to be 1. So, it is assigned to Partition 1.
Similarly, when an order with ID “456” is received, its hash value is calculated to be 0. Therefore, it is assigned to Partition 0.
If another order with ID “123” is sent later on, Kafka remembers that its hash value is 1. Hence, it will be consistently assigned to Partition 1, ensuring that all orders with the same ID are stored together.
Select the key(s) for key based partitioner and proceed with emitter configuration.
Enable TTL
Check this box to activate Time to Live (TTL) for records. When enabled, records will only persist in the system for a specified duration.
TTL Value
Specify the duration for which records should persist, measured in seconds.
Output Mode
Choose the output mode for writing data to the Streaming emitter:
Append: Only new rows in the streaming data will be written to the target.
Complete: All rows in the streaming data will be written to the targer every time there are updates.
Update: Only rows that were updated in the streaming data will be written to the target every time there are updates.
Enable Trigger
Define how frequently the streaming query should be executed.
Trigger Type
Choose from the following supported trigger types:
One-Time Micro-Batch: Processes only one batch of data in a streaming query before terminating the query.
Fixed Interval Micro-Batches: Runs a query periodically based on an interval in processing time.
Processing Time
This option appears when the Enable Trigger checkbox is selected. Specify the trigger time interval in minutes or seconds.
Post Action
Post actions are available with batch data sources.
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions →
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!