Redshift Emitter
In this article
Redshift emitter works for both Streaming and Batch Datasets. It allows data to be pushed into the Redshift tables.
Redshift Emitter Configuration
To add a Redshift emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:
Field | Description |
---|---|
Connection Name | All Redshift connections will be listed here. Select a connection for connecting to Redshift |
Message Name | Message used in the pipeline |
Schema Name | Schema name should be selected while writing data into Redshift. |
Table Name | Existing database tablename whose schema is to be fetched. |
Connection Retries | Number of retries for component connection. |
Delay Between Connection Retries | Defines the retry delay intervals for component connection in millis. |
Checkpoint Storage Location | Select the checkpointing storage location. Available options are HDFS, S3, and EFS. |
Checkpoint Connections | Select the connection. Connections are listed corresponding to the selected storage location. |
Checkpoint Directory | It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/ , checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir |
Time-Based Check Point | Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis. |
Output Mode | Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink. Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Processing Time | Processing Time is the trigger time interval in minutes or seconds. This property will appear only when Enable Trigger checkbox is selected. |
ADD CONFIGURATION | Enables to configure additional custom properties. |
If you have any feedback on Gathr documentation, please email us!