Advanced Kafka Emitter

Advanced Kafka Emitter stores data to Advanced Kafka cluster.

Data formats supported are JSON and DELIMITED.

Advanced Kafka Emitter Configuration

To add an Advanced Kafka Emitter to your pipeline, drag the Advanced Kafka emitter onto the canvas, connect it to a Data Source or processor, and right-click on it to configure.

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	All Kafka connections will be listed here. Select a connection for connecting to Advanced Kafka.
Topic Name	Advanced Kafka topic name where you want to emit data.
Partitions	Number of partitions to be created for a topic. Each partition is ordered, immutable sequence of messages that is continually appended to a commit log.
Replication Factor	For a topic with replication factor N, Advanced Kafka will tolerate up to N-1 failures without losing any messages committed to the log.
Output Format	Data Type format of the output message.
Output Fields	Fields in the message that needs to be a part of the output data.
Advanced Kafka Partitioner	Round Robin(Default): Advanced Kafka uses the default partition mechanism to distribute the data. Key Based: Advanced Kafka partitioning is done on the basis of keys. Custom: Enables to write custom partition class by implementing the `com.Gathr.framework.api.spark.partitioner.SAXAdvance` Kafka Partitioner interface. The partition method contains logic to calculate the destination partition and returns the target partition number
Enable TTL	Select the checkbox to enable TTL(Time to Live) for records to persist for that time duration
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based checkpoint	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Output Mode	Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Processing Time	It will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds.

Click on the Next button. Enter the notes in the space provided.

Click on the DONE button for saving the configuration.

If you have any feedback on Gathr documentation, please email us!

Advanced Kafka Emitter

Advanced Kafka Emitter Configuration #

Advanced Kafka Emitter Configuration