Amazon S3 Ingestion Target

Amazon S3 stores data as objects within resources called Buckets. S3 target stores objects on the specified Amazon S3 bucket.

Target Configuration

Configure the target parameters that are explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →

Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Bucket Name

Buckets are storage units used to store objects, which consists of data and metadata that describes the data. S3 target bucket name is to be specified.

Path

File/Directory path of the target bucket should be given where the processed objects will be emitted.

Path given directly to the root folder of a bucket is not supported.

A directory path can be given as shown below:

Example: /sales/2022-07-14

Single statement MVEL expressions can be used to create custom folders in the bucket.

Example: The expression sales/@{java.time.LocalDate.now()} will create a folder with the <current_date> inside sales directory.

Custom File Name

Enable this option for creating custom file name in the specified S3 path.

👉

Partitioning is not allowed if this option is enabled. This option is not recommended for large files due to performance impact.

File Name

A custom file name should be provided.

Example: sales_@{java.time.LocalDate.now()} input will create sales_<current_date>.<output_format> file.

👉

Only single statement MVEL expressions are supported for creating dynamic file names.

It would be best to have a unique file name while using the same target location for multiple pipelines run, as in the case of incremental read. Otherwise, it will overwrite data in each iteration.

File Format

Output format in which result will be processed.

Delimiter

A message field separator should be selected for CSV (Delimited) file format.

Header Included

Option to write the first row of the data file as header.

Output Fields

Fields in the message that needs to be emitted should be selected.

Output Mode

Output mode specified how to write the data.

Add configuration: Additional properties can be added using Add Configuration link.

More Configurations

Save Mode

Save Mode is used to specify the expected behavior of saving data to the target S3 path in the bucket.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
ErrorifExists: When persisting data, if the data already exists, an exception is expected to be thrown.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.
Update: When persisting data, if data/table already exists, existing data is expected to be updated by the contents of the Data with additional option as specified in the Update Type option below.

If Save Mode field is selected with Update option, additional fields will be displayed as given below:

Update Type

Option to update the latest data out of existing and source data with a choice to either overwrite or keep the latest data with version.

Join Columns

The columns that need to be joined should be selected from the list.

If Save Mode field is selected with any other option than Update, then proceed by updating the following field.

Partitioning Required

Whether to partition data on s3 or not.

Partition Columns

If partitioning is enabled, select the fields on which data will be partitioned.

Limitations on custom/dynamic folder or file naming

Not supported on streaming data as the output file will be renamed.
Partitioning of the emitted data is not supported.
Option to invoke custom functions in MVEL expressions is not supported.

If you have any feedback on Gathr documentation, please email us!

Amazon S3 Ingestion Target

Target Configuration #

Connection Name #

Bucket Name #

Path #

Custom File Name #

File Name #

File Format #

Delimiter #

Header Included #

Output Fields #

Output Mode #

Save Mode #

Update Type #

Join Columns #

Partitioning Required #

Partition Columns #

Limitations on custom/dynamic folder or file naming #