Amazon S3 Ingestion Target
Amazon S3 stores data as objects within resources called Buckets. S3 target stores objects on the specified Amazon S3 bucket.
Target Configuration
Configure the target parameters that are explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →
Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.
A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.
Bucket Name
Buckets are storage units used to store objects, which consists of data and metadata that describes the data. S3 target bucket name is to be specified.
Path
File/Directory path of the target bucket should be given where the processed objects will be emitted.
Path given directly to the root folder of a bucket is not supported.
A directory path can be given as shown below:
Example: /sales/2022-07-14
Single statement MVEL expressions can be used to create custom folders in the bucket.
Example: The expression sales/@{java.time.LocalDate.now()} will create a folder with the <current_date> inside sales directory.
Custom File Name
Enable this option for creating custom file name in the specified S3 path.
File Name
A custom file name should be provided.
Example: sales_@{java.time.LocalDate.now()} input will create sales_<current_date>.<output_format> file.
It would be best to have a unique file name while using the same target location for multiple pipelines run, as in the case of incremental read. Otherwise, it will overwrite data in each iteration.
File Format
Output format in which result will be processed.
Delimiter
A message field separator should be selected for CSV (Delimited) file format.
Header Included
Option to write the first row of the data file as header.
Output Fields
Fields in the message that needs to be emitted should be selected.
Output Mode
Output mode specified how to write the data.
Add configuration: Additional properties can be added using Add Configuration link.
More Configurations
Save Mode
Save Mode is used to specify the expected behavior of saving data to the target S3 path in the bucket.
Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
ErrorifExists: When persisting data, if the data already exists, an exception is expected to be thrown.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.
Update: When persisting data, if data/table already exists, existing data is expected to be updated by the contents of the Data with additional option as specified in the Update Type option below.
If Save Mode field is selected with Update option, additional fields will be displayed as given below:
Update Type
Option to update the latest data out of existing and source data with a choice to either overwrite or keep the latest data with version.
Join Columns
The columns that need to be joined should be selected from the list.
If Save Mode field is selected with any other option than Update, then proceed by updating the following field.
Partitioning Required
Whether to partition data on s3 or not.
Partition Columns
If partitioning is enabled, select the fields on which data will be partitioned.
Limitations on custom/dynamic folder or file naming
Not supported on streaming data as the output file will be renamed.
Partitioning of the emitted data is not supported.
Option to invoke custom functions in MVEL expressions is not supported.
If you have any feedback on Gathr documentation, please email us!