S3 ETL Target

Amazon S3 stores data as objects within resources called Buckets. S3 emitter stores objects on Amazon S3 bucket.

Target Configuration

Configure the target parameters that are explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →


Bucket Name

Buckets are cloud storage containers for objects stored in S3. Specify an S3 target bucket name to store the emitted data.


Path

File/Directory path of the target bucket should be given where the processed objects will be emitted.

A directory path can be given as shown below:

Example: /sales/2022-07-14

Single statement MVEL expressions can be used to create custom folders in the bucket.

Example: The expression sales/@{java.time.LocalDate.now()} will create a folder with the <current_date> inside sales directory.


Custom File Name

Enable this option for creating custom file name in the specified S3 path.


File Name

A custom file name should be provided.

Example: sales_@{java.time.LocalDate.now()} input will create sales_<current_date>.<output_format> file.

It would be best to have a unique file name while using the same target location for multiple pipelines run, as in the case of incremental read. Otherwise, it will overwrite data in each iteration.


File Format

Output format in which result will be processed.


Delimiter

A message field separator should be selected for CSV (Delimited) file format.


Header Included

Option to write the first row of the data file as header.


Output Fields

Fields in the message that needs to be emitted should be selected.


Save Mode

Save Mode is used to specify how to handle any existing data in the target.

  • Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

  • ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.

  • Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.

    This is similar to a CREATE TABLE IF NOT EXISTS in SQL.

  • Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.


Output Mode

Output mode specified how to write the data.


Enable Trigger

Trigger defines how frequently a streaming query should be executed.


Processing Time

Processing Time is the trigger time interval in minutes or seconds.

This property will appear only when Enable Trigger checkbox is selected.


Partitioning Required

If this option is checked, the emitted data will be partitioned in the target.


Partition Columns

This option is to select fields on which the data will be partitioned.


ADD CONFIGURATION: Enables to configure additional custom properties.


Limitations on custom/dynamic folder or file naming

  • Not supported on streaming data as the output file will be renamed.

  • Partitioning of the emitted data is not supported.

  • Option to invoke custom functions in MVEL expressions is not supported.


Post Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions →


Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top