S3 ETL Target
Amazon S3 stores data as objects within resources called Buckets. S3 emitter stores objects on Amazon S3 bucket.
Target Configuration
Configure the target parameters that are explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →
Bucket Name
Buckets are cloud storage containers for objects stored in S3. Specify an S3 target bucket name to store the emitted data.
Path
File/Directory path of the target bucket should be given where the processed objects will be emitted.
A directory path can be given as shown below:
Example: /sales/2022-07-14
Single statement MVEL expressions can be used to create custom folders in the bucket.
Example: The expression sales/@{java.time.LocalDate.now()} will create a folder with the <current_date> inside sales directory.
Custom File Name
Enable this option for creating custom file name in the specified S3 path.
File Name
A custom file name should be provided.
Example: sales_@{java.time.LocalDate.now()} input will create sales_<current_date>.<output_format> file.
It would be best to have a unique file name while using the same target location for multiple pipelines run, as in the case of incremental read. Otherwise, it will overwrite data in each iteration.
File Format
Output format in which result will be processed.
Delimiter
A message field separator should be selected for CSV (Delimited) file format.
Header Included
Option to write the first row of the data file as header.
Output Fields
Fields in the message that needs to be emitted should be selected.
Save Mode
Save Mode is used to specify how to handle any existing data in the target.
Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.
Output Mode
Output mode specified how to write the data.
Enable Trigger
Trigger defines how frequently a streaming query should be executed.
Processing Time
Processing Time is the trigger time interval in minutes or seconds.
This property will appear only when Enable Trigger checkbox is selected.
Partitioning Required
If this option is checked, the emitted data will be partitioned in the target.
Partition Columns
This option is to select fields on which the data will be partitioned.
ADD CONFIGURATION: Enables to configure additional custom properties.
Limitations on custom/dynamic folder or file naming
Not supported on streaming data as the output file will be renamed.
Partitioning of the emitted data is not supported.
Option to invoke custom functions in MVEL expressions is not supported.
Post Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions →
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!