Kafka ETL Source

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.

Data Source Configuration

Each configuration property available in the Kafka data source is explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Kafka earlier. Or create one as explained in the topic - Kafka Connection →


Batch

Option to enable batch processing.


Topic Type

Select one of the below option to fetch the records from Kafka topic(s).

Topic name: The topic name is used to subscribe a single topic.

Topic list: The topic list is used to subscribe a comma-separated list of topics.

Pattern: The pattern is used to subscribe to topic values as Java regex:

With Partitions: The topic with partitions is used for specific topic(s) partitions to consume. i.e. json string {"topicA":[0,1],"topicB":[2,4]}

Additional configuration fields that appear for the Topic Name are described below:

Topic Name

Topic in Kafka from where messages will be read.

Partitions

Number of partitions. Each partition is an ordered unchangeable sequence of message that is repeatedly added to a commit log.

Replication Factor

Number of replications. Replication provides stronger durability and higher availability. For example, a topic with replication factor N can tolerate up to N-1 server failures without losing any messages committed to the log.


Record Has Header

Option to read record headers along with data from the Kafka topics.


Replace Nulls with Blanks

Check this option to replace all the null values in incoming data with no value/blank.


Preserve Quotes

Check this option to preserve quotes in the delimited dataset. For example, ‘a,b’,c will be emitted into two field values ‘a,b’ and c. If unchecked, the same will be emitted as three field values a,b and c.


Specify Consumer Group

Specify consumer ID type. Default type is auto, which means it will be auto-generated by Kafka.

Other options are: Group Id and Group Id Prefix.


Define Offset

Following configurations are used for Kafka offset.

Earliest: The starting point of the query is from the starting /first offset.

Latest: The starting point of the query is just from the latest offset.


Connection Retries

The number of retries for component connection. Possible values are -1, 0 or any positive number. If the value is -1 then there would be infinite retries for infinite connection.


Max Offset Per Trigger

Rate limit on maximum number of offsets processed per trigger interval. The specified total number of offsets will be proportionally split across topic Partitions of different volume.


Fail on Data Loss

Provides option of query failure in case of data loss. (For example, topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn’t work as you expected. Batch queries will always fail, if it fails to read any data from the provided offsets due to data loss.


Delay Between Connection Retries

Retry delay interval for component connection (in milliseconds).


Log Parsing Errors

Check, to log parsing errors in pipeline logs.


ADD CONFIGURATION: To add additional custom Kafka properties in key-value pairs.

Detect Schema

Check the populated schema details. For more details, see Schema Preview →

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top