Kafka Data Source

Under the Schema Type tab, select Fetch From Source or Upload Data File.

When fetch from source is chosen, the schema tab comes after the configuration, and when you upload data, the schema tab comes before configuration.

Configuring Kafka Data Source

FieldDescription
Connection Name

Connections are the Service identifiers.

Select the connection name from the available list of connections, from where you would like to read the data.

BatchCheck the option to enable batch processing.
Topic Type

Select one of the below option to fetch the records from Kafka topic(s)

- Topic name: The topic name is used to subscribe a single topic

- Topic list: The topic list is used to subscribe a comma-separated list of topics

- Pattern: The pattern is used to subscribe to topic values as Java regex:

- With Partitions80: The topic with partitions is used for specific topic(s) partitions to consume. i.e. json string {“topicA”:[0,1],“topicB”:[2,4]}

Schema must be same in case of Topic List/Pattern/With partition.

Topic NameTopic in Kafka from where messages will be read.
Topic List/ Pattern/ With partitionsA topic is category or feed name to which messages will be published
PartitionsNumber of partitions. Each partition is an ordered unchangeable sequence of message that is repeatedly added to a commit log.
Replication FactorNumber of replications. Replication provides stronger durability and higher availability. For example, a topic with replication factor N can tolerate up to N-1 server failures without losing any messages committed to the log.
Record Has Header?Check the option to read record headers along with data from the kafka topic.
Replace Nulls with BlanksEnable flags to replace all null values with blank.
Specify Consumer GroupSpecify consumer ID type. Default value is Auto implying that it will be auto-generated by kafka client. The other available options are:
Group Id: In the Consumer group ID field, specify the group id used for reading data. Use this option cautiously. Concurrently running queries (both batch and streaming) or sources with the same group id are likely to interfere with each other causing each query to read only part of the data. When this is set option ‘groupIdPrefix’ will be ignored.
Group Id Prefix: Specify the consumer group ide prefix to use for reading data. Prefix of consumer group identifiers (group.id) that are generated by structured streaming queries. If ‘kafka.group.id’ is set, then this option will be ignored.
Define Offset

Following configurations are used for Kafka offset.

- Latest: The starting point of the query is just from the latest offset.

- Earliest: The starting point of the query is from the starting /first offset.

- Custom: A json string specifying a starting and ending offset for each partition.

- startingOffsets: A JSON string specifying a starting offset for each partition i.e. {“topicA”:{“0”:23,“1”:-1},“topicB”:{“0”:-1}}

- endingOffsets: A JSON string specifying a ending offset for each partition. This is an optional property with default value “latest”.i.e. {“topicA”:{“0”:23,“1”:-1},“topicB”:{“0”:-1

Connection RetriesThe number of retries for component connection. Possible values are -1, 0 or any positive number. If the value is -1 then there would be infinite retries for infinite connection.
Max Offset Per TriggerRate limit on maximum number of offsets processed per trigger interval. The specified total number of offsets will be proportionally split across topic Partitions of different volume.
Fail on Data LossProvides option of query failure in case of data loss. (For example, topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn’t work as you expected. Batch queries will always fail, if it fails to read any data from the provided offsets due to data loss
Delay Between Connection RetriesRetry delay interval for component connection (in milliseconds).
ADD CONFIGURATIONTo add additional custom Kafka properties in key-value pairs.

Click on the Add Notes tab. Enter the notes in the space provided.

Click Done to save the configuration.

Top