Delta Lake Streaming ETL Source

Delta Lake allows you to read data from Delta tables stored on S3, DBFS, ADLS, and GCS. It provides a structured and transactional approach to reading and processing data. In streaming ETL, real-time data can be ingested and processed, making it valuable for near real-time analytics and event-driven applications.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.


Data Source Configuration

Configure the data source parameters that are explained below.

Source

Select a source for reading the delta file from S3, DBFS, ADLS, or GCS.

Connection Name

Provide connection details for S3, DBFS, ADLS, or GCS based on the chosen source.

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details earlier.


For S3 and GCS sources provide below details:

Bucket Name

Specify the name of the storage bucket where your Delta Lake data is stored. The bucket name helps direct the data source to the correct storage location within the chosen cloud platform (S3 or GCS).


Path

Define the path to the specific location within the storage bucket where your Delta Lake data is stored. This path directs the data source to the precise directory or folder containing the data you want to access.


For DBFS source provide below details:

DBFS file path

File path for DBFS file system.


For ADLS source provide below details:

Container Name

ADLS container name from which the data should be read.

ADLS file path

File path for ADLS file system.


Add Configuration: To add additional custom Delta Lake properties in a key-value pair.


Detect Schema

Check the populated schema details. For more details, see Schema Preview →


Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top