Redshift Emitter

Redshift emitter works for both Streaming and Batch Datasets. It allows data to be pushed into the Redshift tables.

Redshift Emitter Configuration

To add a Redshift emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:

 

FieldDescription
Connection NameAll Redshift connections will be listed here. Select a connection for connecting to Redshift
Message NameMessage used in the pipeline
Schema NameSchema name should be selected while writing data into Redshift.
Table NameExisting database tablename whose schema is to be fetched.
Connection RetriesNumber of retries for component connection.
Delay Between Connection RetriesDefines the retry delay intervals for component connection in millis.
Checkpoint Storage LocationSelect the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint ConnectionsSelect the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/ , checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-Based Check PointSelect checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Output Mode

Output mode to be used while writing the data to Streaming emitter.

Select the output mode from the given three options:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
Processing TimeProcessing Time is the trigger time interval in minutes or seconds. This property will appear only when Enable Trigger checkbox is selected.
ADD CONFIGURATIONEnables to configure additional custom properties.
Top