S3 Streaming Data Source

S3 data source reads objects from Amazon S3 bucket. Amazon S3 stores data as objects within resources called Buckets.

For an S3 data source, if data is fetched from the source, and the type of data is CSV, the schema has an added tab, Header Included in source.

On an S3 channel, you will be able to read data from with formats including JSON, CSV, TEXT, XML, Fixed Length, Parquet, Binary, ORC, Avro.

This is to signify if the data that is fetched from source has a header or not.

If Upload Data File is chosen, then there is an added tab, which is Is Header Included in the source. This signifies if the data uploaded is included in the source or not.

Configuring S3 Data Source

To add the S3 data source into your pipeline, drag the source to the canvas and click on it to configure.

Under the Schema Type tab, select Fetch From Sourceor Upload Data File.

Field	Description
Connection Name	Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data.
S3 Protocol	Protocols available are S3, S3n, S3a
End Point	S3 endpoint details should be provided if the source is Dell EMC S3.
Bucket Name	Buckets are storage units used to store objects, which consists of data and meta-data that describes the data.
Override Credentials	Unchecked by default, check mark the checkbox to override credentials for user specific actions. Provide AWS KeyID and Secret Access Key.
Path	File or directory path from where data is to be read. The path to be added must be a directory and not absolute.
File Filter	Provide a file pattern. File filter is used to only include files with file names matching the pattern. For e.g .pdf or emp *.csv
Recursive File Lookup	Check the option to retrieve the files from current/sub-folder(s).
Add Configuration	To add additional custom S3 properties in a key-value pair. User can add further configurations by the following ways: Use key: avroSchema in case if you want to provide avro schema file and paste content (as value) in JSON format to map the schema. - Use key: avroSchemaFilePath and provide S3 absolutepath of AVSC schema file as value. To load the schema file from s3, IAM Role attached to Instance Profile will be used.

👉

File Filter and recusursive file lookup will be available when the binary format is selected.

Click on the Add Notes tab. Enter the notes in the space provided.

Click Done to save the configuration.

If you have any feedback on Gathr documentation, please email us!

S3 Streaming Data Source

Configuring S3 Data Source #

Configuring S3 Data Source