Delta Lake ETL Target

The Delta emitter for S3 is particularly useful when you want to write and store data processed by your ETL application into S3. It leverages Delta Lake’s transactional capabilities to ensure data consistency and reliability when storing data in S3.

Target Configuration

Configure the target parameters that are explained below.

Emitter Type

Select a target for emitting the delta files. Currently, S3 is the supported option.

Connection Name

Provide connection details for S3 based on the chosen emitter type.

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details earlier.

Create a connection for S3 as explained in the topic - Amazon S3 Connection →

Bucket Name

Specify the name of the bucket where your Delta Lake data is to be emitted. The bucket name helps direct the emitter to the correct storage location within the chosen cloud platform (example, S3).

Path

Define the path to the specific location within the storage bucket where your Delta Lake data is to be stored. This path directs the emitter to the precise directory or folder where you want to store the data.

Output Fields

Fields in the message that needs to be emitted should be selected.

Partitioning Required

If this option is checked, the emitted data will be partitioned in the target.

Partition Columns

If Partitioning Required is checked, the Partition Columns field appears. Specify the columns by which the data will be partitioned when emitting. Partitioning can improve data organization and retrieval performance by dividing data into subsets based on these columns.

Save Mode

Save Mode is used to specify how to handle any existing data in the target.

ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.
Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
Upsert: When persisting data, if the data already exists, the operation combines existing data with the new data, updating records when there are conflicts and adding new records when necessary. This mode is suitable for merging data and ensuring data consistency by applying a combination of inserts and updates as needed.

ADD CONFIGURATION: Enables to configure additional custom properties.

Post Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions →

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

If you have any feedback on Gathr documentation, please email us!

Delta Lake ETL Target

Target Configuration #

Emitter Type #

Connection Name #

Bucket Name #

Path #

Output Fields #

Partitioning Required #

Partition Columns #

Save Mode #

Post Action #

Notes #