S3 Streaming Data Source
In this article
S3 data source reads objects from Amazon S3 bucket. Amazon S3 stores data as objects within resources called Buckets.
For an S3 data source, if data is fetched from the source, and the type of data is CSV, the schema has an added tab, Header Included in source.
On an S3 channel, you will be able to read data from with formats including JSON, CSV, TEXT, XML, Fixed Length, Parquet, Binary, ORC, Avro.
This is to signify if the data that is fetched from source has a header or not.
If Upload Data File is chosen, then there is an added tab, which is Is Header Included in the source. This signifies if the data uploaded is included in the source or not.
Configuring S3 Data Source
To add the S3 data source into your pipeline, drag the source to the canvas and click on it to configure.
Under the Schema Type tab, select Fetch From Sourceor Upload Data File.
Field | Description |
---|---|
Connection Name | Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data. |
S3 Protocol | Protocols available are S3, S3n, S3a |
End Point | S3 endpoint details should be provided if the source is Dell EMC S3. |
Bucket Name | Buckets are storage units used to store objects, which consists of data and meta-data that describes the data. |
Override Credentials | Unchecked by default, check mark the checkbox to override credentials for user specific actions. Provide AWS KeyID and Secret Access Key. |
Path | File or directory path from where data is to be read. The path to be added must be a directory and not absolute. |
File Filter | Provide a file pattern. File filter is used to only include files with file names matching the pattern. For e.g *.pdf or *emp *.csv |
Recursive File Lookup | Check the option to retrieve the files from current/sub-folder(s). |
Add Configuration | To add additional custom S3 properties in a key-value pair. User can add further configurations by the following ways: Use key: avroSchema in case if you want to provide avro schema file and paste content (as value) in JSON format to map the schema. - Use key: avroSchemaFilePath and provide S3 absolutepath of AVSC schema file as value. To load the schema file from s3, IAM Role attached to Instance Profile will be used. |
Click on the Add Notes tab. Enter the notes in the space provided.
Click Done to save the configuration.
If you have any feedback on Gathr documentation, please email us!