Advanced Redshift Emitter
In this article
Advanced Redshift emitter uses S3 temp directory to unload data into Redshift database table.
Advanced Redshift Emitter Configuration
To add an Advanced Redshift emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:
Field | Description |
---|---|
Connection Name | All Redshift connections will be listed here. Select a connection for connecting to Redshift |
Override Credentials | Unchecked by default, check mark the checkbox to override credentials for user specific actions. Also provide database username and password. |
S3 Connection Name | All S3 connections will be listed here.Select a connection for connection for S3 temp dir. |
S3 Temp Directory | A writable location in Amazon S3, to be used for unloaded data when data to be loaded into Redshift when writing. Example: |
S3 Temp Format | Output format in which data will be written in S3 temp directory. |
Message Name | Message used in the pipeline. |
Schema Name | Schema used in the pipeline. |
Table Name | Existing database tablename whose schema is to be fetched. |
Connection Retries | Number of retries for component connection. The possible values are -1, 0 or positive number, where -1 signifies infinite retries. |
Delay Between Connection Retries | Defines the retry delay intervals for component connection in millis. |
Write Retry Count | Defines the retry count for component write failure. |
Delay Between Write Retries | Defines the retry delay interval (milliseconds) for component retry count. |
Query Timeout | Defines the query time out interval (in minutes) for component to save each batch data to redshift. Query will be aborted if query does not finish in the given timeout interval. Default -1 will be infinite query timeout. |
Create Relational Tables | Check this option if you want to create relational tables with nested data. This option does not work when Save mode is Update. |
Checkpoint Storage Location | DBFS checkpoint storage location is not supported when pipeline configure on EMR. |
Checkpoint Connections | Check the connection. Connection are listed corresponding to selected storage location. |
Override Credential | Check the option for overriding the user specific actions. Provide the username. |
Check Point Directory | It is the HDFS path where the Spark application stores the checkpoint data. |
Time-based Check Point | Select time-based check point on each pipeline run. In each pipeline run the above provided checkpoint location will be appended with current time in millis. |
Output Mode | Output mode to be used while writing the data to Streaming emitter. Select the output mode from the below options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Trigger Type | Select one of the options available from the drop-down: - One-Time Micro-Batch - Fixed Interval Micro-Batches. |
Priority | Priority defines the execution order of emitters. |
ADD CONFIGURATION | Enables to configure additional custom properties. |
Click on the Next button. Enter the notes in the space provided.
Click on the DONE button for saving the configuration.
If you have any feedback on Gathr documentation, please email us!