Amazon S3 Ingestion Source
Amazon S3 data source reads the objects from the Amazon S3 buckets.
Data Source Configuration
Configure the data source parameters that are explained below.
Fetch From Source/Upload Data File
For designing the application, you can either fetch the sample data from the Amazon S3 source by providing the data source connection details or upload a sample data file in one of the supported formats to see the schema details during the application design.
Upload Data File
If Upload Data File is selected to fetch sample data, provide the below details.
File Format
Select the sample file format (file type) depending on the data type.
Gathr-supported file formats for Amazon S3 data source are CSV, JSON, TEXT, Parquet, ORC and AVRO.
For CSV file format, select its corresponding delimiter.
Header Included
Enable this option to read the first row as a header if your Amazon S3 data is in CSV format.
Upload
Please upload the sample file as per the file format selected above.
Fetch From Source
If Fetch From Source is selected, continue configuring the data source.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →
Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.
A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.
Bucket Name
Buckets are storage units used to store objects, which consists of data and meta-data that describes the data.
Path
File or directory path from where data is to be read. The path must end with *
in case of directory.
Example: outdir/*
File Filter
Provide a file pattern example: *csv/*json to retrieve the available files.
Recursive File Lookup
Check the option to retrieve the files from current/sub-folder(s).
Add Configuration: Additional properties can be added using this option as key-value pairs.
Schema
Check the populated schema details. For more details, see Schema Preview →
Advanced Configuration
Optionally, you can enable incremental read. For more details, see Amazon S3 Incremental Configuration →.
If you have any feedback on Gathr documentation, please email us!