Kafka Emitter

Kafka emitter stores data to Kafka cluster. The supoorted data formats are JSON and DELIMITED.

Kafka Emitter Configuration

To add a Kafka emitter to your pipeline, drag the Kafka emitter onto the canvas and connect it to a Data Source or Processor.

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

The configuration settings of the Kafka emitter are mentioned below.

Field	Description
Connection Name	All Kafka connections will be listed here.Select a connection for connecting to Kafka.
Topic Name Type	Topic Name Type could be either Static or Dynamic. Static: When you select Static, field “Topic Name” is pertained as a text box. Provide a unique topic name to emit data. Dynamic: With Dynamic, topic name will be populated with a drop down of the field names of incoming dataset. On selecting any of the field name, it will create topics based on the value of the selected field name in the topic name. (which means at runtime it will create topics dynamically using the selected field name values.)
Topic Name	Kafka topic name where you want to emit data. In case of Dynamic Topic Name Type, at runtime the Kafka Emitter will take the value of the selected field as a topic name for each incoming records.
Partitions	Number of partitions to be created for a topic. Each partition is ordered, immutable sequence of messages that is continually appended to a commit log.
Replication Factor	For a topic with replication factor N, Kafka will tolerate up to N-1 failures without losing any messages committed to the log.
Producer Type	Specifies whether the messages are sent asynchronously or synchronously in a background thread. Valid values are async for asynchronous send and sync for synchronous send
Output Format	Data type format of the output message. Three types of output format are available: JSON Delimited AVRO
Avro Schema Source	Select the Source of Avro Schema: Input Text Schema: Manually enter the text schema in the field, Input Schema. Schema Registry: It allows you to enter a schema registry subject name for Avro schema, in the field, Subject Name. Auto Generate Schema: Schema generated is automatic based on the previous incoming dataset.
Header Fields	Fields in the message that needs to be a part of output data.
Output Fields	Message fields which will be a part of output data.
Message Key	The type of key you want to store in Kafka along with the data. It can be stored in the following four types: Field Value: The values of the fields present in dataset concatenated by “#”. Field Value Hash: The hash value of the values of the fields present in dataset concatenated by “#" Static: The static value to be put as key in kafka along with data. UUID: UUID to be put as a key along with data in kafka.
Kafka Partitioner	Kafka uses the default partition mechanism to distribute the data. Key Based: kafka partitioning is done on the basis of keys. Custom: Enables to write custom partition class by implementing the `com.Gathr.framework.api.spark.partitioner.SAXKafkaPartitioner` interface. The partition method contains logic to calculate the destination partition and returns the target partition number.
Enable TTL	Select the checkbox to enable TTL(Time to Live) for records to persist for that time duration. Also provide the TTL value in seconds. However, if Streaming datasource used in pipeline along with Aggregation without watermark then it is recommended not to use Append as output mode. And finally provide the Priority value that defines the execution order for emitters.
Checkpoint Directory	Location where the checkpoint data is stored.
Output Mode	Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Processing Time	It will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds.

If you have any feedback on Gathr documentation, please email us!

Kafka Emitter

Kafka Emitter Configuration #

Kafka Emitter Configuration