Redshift Emitter

Redshift emitter works for both Streaming and Batch Datasets. It allows data to be pushed into the Redshift tables.

Redshift Emitter Configuration

To add a Redshift emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	All Redshift connections will be listed here. Select a connection for connecting to Redshift
Message Name	Message used in the pipeline
Schema Name	Schema name should be selected while writing data into Redshift.
Table Name	Existing database tablename whose schema is to be fetched.
Connection Retries	Number of retries for component connection.
Delay Between Connection Retries	Defines the retry delay intervals for component connection in millis.
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/ , checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based Check Point	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Output Mode	Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink. Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Processing Time	Processing Time is the trigger time interval in minutes or seconds. This property will appear only when Enable Trigger checkbox is selected.
ADD CONFIGURATION	Enables to configure additional custom properties.

If you have any feedback on Gathr documentation, please email us!

Redshift Emitter

Redshift Emitter Configuration #

Redshift Emitter Configuration