S3 Emitter

Amazon S3 stores data as objects within resources called Buckets. S3 emitter stores objects on Amazon S3 bucket.

S3 Emitter Configuration

To add an S3 emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	All S3 connections will be listed here. Select a connection for connecting to S3.
S3 protocol	S3 protocol to be used while writing on S3.
End Point	S3 endpoint details should be provided if the source is Dell EMC S3.
Bucket Name	Buckets are storage units used to store objects, which consists of data and meta-data that describes the data.
Override Credentials	Unchecked by default, check mark the checkbox to override credentials for user specific actions.
AWS Key Id	Provide the S3 account access key.
Secret Access Key	Provide the S3 account secret key. 👉 Once the AWS Key Id and Secret Access Key is provided, user has an option to test the connection.
Path	File or directory path from where data is to be stored.
Output Type	Output format in which result will be processed.
Delimiter	Message Field seperator.
Output Fields	Fields of the output message.
Partitioning Required	Whether to partition data on s3 or not.
Save Mode	Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
Output Mode	Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS. Note: It is recommended that you use s3a protocol along with the path. In case of AWS Databricks cluster, while creating a new cluster (within Cluster List View), under IAM role, s3 Role must be selected.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Override Credential	Check the option for overriding the user specific actions. Provide the username.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based Check Point	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Trigger Type	Select one of the options available from the drop-down: - One-Time Micro-Batch - Fixed Interval Micro-Batches.
Priority	Priority defines the execution order of emitters.
ADD CONFIGURATION	Enables to configure additional custom properties. Note: Add various Spark configurations as per requirement. Example: Perform imputation by clicking the ADD CONFIGURATION button. Note: For imputation replace nullValue/emptyValue with the entered value across the data. (Optional) Example: nullValue =123, the output will replace all null values with 123

If you have any feedback on Gathr documentation, please email us!

S3 Emitter

S3 Emitter Configuration #

S3 Emitter Configuration