Delta Lake ETL Target
The Delta emitter for S3 is particularly useful when you want to write and store data processed by your ETL application into S3. It leverages Delta Lake’s transactional capabilities to ensure data consistency and reliability when storing data in S3.
Target Configuration
Configure the target parameters that are explained below.
Emitter Type
Select a target for writing the delta files. Options are S3, DBFS, ADLS, or GCS.
Connection Name
Provide connection details for S3, DBFS, ADLS, or GCS based on the chosen source.
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details earlier.
Create a connection for S3 as explained in the topic - Amazon S3 Connection →
Create a connection for DBFS as explained in the topic - DBFS Connection →
Create a connection for ADLS as explained in the topic - ADLS Connection →
Create a connection for GCS as explained in the topic - GCS Connection →
For S3 and GCS targets provide below details:
Bucket Name
Specify the name of the target bucket to store data.
Path
Define the path to the specific location within the bucket where data is to be stored.
For DBFS target provide below details:
DBFS file path
File path for DBFS file system.
For ADLS target provide below details:
Container Name
ADLS container name to which the data should be emitted.
ADLS file path
File path for ADLS file system.
Output Fields
Fields in the message that needs to be emitted should be selected.
Partitioning Required
If this option is checked, the emitted data will be partitioned in the target.
Partition Columns
If Partitioning Required is checked, the Partition Columns field appears. Specify the columns by which the data will be partitioned when emitting. Partitioning can improve data organization and retrieval performance by dividing data into subsets based on these columns.
Save Mode
Save Mode is used to specify how to handle any existing data in the target.
ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.
Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
Upsert: When persisting data, if the data already exists, the operation combines existing data with the new data, updating records when there are conflicts and adding new records when necessary. This mode is suitable for merging data and ensuring data consistency by applying a combination of inserts and updates as needed.
ADD CONFIGURATION: Enables to configure additional custom properties.
Post Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions →
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!