Confluent Cloud ETL Source

On a Confluent Cloud Channel you will be able to read data from specified Confluent Cloud Container.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.


Data Source Configuration

Configure the data source parameters that are explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Confluent Cloud earlier. Or create one as explained in the topic - Confluent Cloud Connection →


Batch

Option to enable batch processing.


Topic Type

Select one of these options:

Topic list: The topic list is used to subscribe topic(s).

Pattern: The pattern is used to subscribe to topics, value as Java regex.


Topic List

A Topic List is a list or collection of Kafka topic names that you want to perform operations on or interact with.

Example: You might have a list of topics that you want to consume data from or produce data to.

In Confluent Cloud, you can specify a topic or a list of topics to configure consumers, producers, or other Kafka clients to work with specific data streams.


Pattern

A Pattern is a way to subscribe to or match multiple topics based on a regular expression pattern rather than specifying individual topic names.

This is particularly useful when you have a large number of topics that follow a naming convention, and you want to consume or produce data from/to all topics that match the pattern.

Common patterns include wildcard subscriptions like “topic-*” to match all topics with names starting with “topic-.”


Specify Consumer Group

Select one of the options from Group Id Prefix or Group Id.

Consumer Group Id Prefix

Specify consumer group id prefix to use for reading data.

Prefix of consumer group identifiers (group.id) that are generated by structured streaming queries. If Consumer Group ID is set, this option will be ignored".

Consumer Group Id

Specify consumer group id to use for reading data. Use this with caution.

Concurrently running queries (both, batch and streaming) or sources with the same group id are likely to interfere with each other causing each query to read only part of the data.

When this is set, option “Group Id Prefix” will be ignored.


Define Offset

Define Offset Parameter. Choose the data processing offset for this topic:

Earliest (start from the beginning),

Custom (specify a custom offset), or

Incremental (process only new data).

If Earliest is selected:

Connection Retries

The number of retries for component connection. Possible values are -1, 0 or positive number, where -1 signifies infinite retries.

If Custom is selected:

Starting Offsets

The start point when a query is started, either ’earliest’ which is from the earliest offsets, ’latest’ which is just from the latest offsets, or a JSON string specifying a starting offset for each TopicPartition. i.e json string fomat {’topicA’:{‘0’ :23,‘1’ :-1},’topicB’:{‘0’:-2}}

Note: In the JSON, -2 as an offset can be used to refer to earliest, -1 to latest.

Ending Offsets

The end point when a batch query is ended, either ’latest’ which is just referred to the latest, or a JSON string specifying an ending offset for each TopicPartition. i.e JSON string fomat {’topicA’:{‘0’ :23,‘1’ :-1},’topicB’:{‘0’:-1}}

Note: In the JJSON, -1 as an offset can be used to refer to latest, and -2 (earliest) as an offset is not allowed. By default it will take latest.

If Incremental is selected:

Incremental Offsets

The start point when a query is started, either ’earliest’ which is from the earliest offsets, ’latest’ which is just from the latest offsets, or a JSON string specifying a starting offset for each TopicPartition. i.e JSON string fomat {’topicA’:{‘0’ :23,‘1’ :-1},’topicB’:{‘0’:-2}}

Note: In the JSON, -2 as an offset can be used to refer to earliest, -1 to latest.


Connection Retries

The number of retries for component connection. Possible values are -1, 0 or positive number, where -1 signifies infinite retries.


Max Offsets Per Trigger

Rate limit on maximum number of offsets processed per trigger interval. The specified total number of offsets will be proportionally split across topicPartitions of different volume.


Fail On Data Loss

Whether to fail the query when it’s possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn’t work as you expected. Batch queries will always fail if it fails to read any data from the provided offsets due to lost data.


Delay Between Connection Retries

Defines the retry delay interval (in milliseconds) for component connection.


Add Configuration: To add additional custom ADLS properties in a key-value pair.


Detect Schema

Check the populated schema details. For more details, see Schema Preview →


Pre Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →.


Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top