Advanced Redshift Emitter

Advanced Redshift emitter uses S3 temp directory to unload data into Redshift database table.

Advanced Redshift Emitter Configuration

To add an Advanced Redshift emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	All Redshift connections will be listed here. Select a connection for connecting to Redshift
Override Credentials	Unchecked by default, check mark the checkbox to override credentials for user specific actions. Also provide database username and password.
S3 Connection Name	All S3 connections will be listed here.Select a connection for connection for S3 temp dir.
S3 Temp Directory	A writable location in Amazon S3, to be used for unloaded data when data to be loaded into Redshift when writing. Example: `s3n://myBucket/NYC`, where myBucket is the S3 bucket name.
S3 Temp Format	Output format in which data will be written in S3 temp directory.
Message Name	Message used in the pipeline.
Schema Name	Schema used in the pipeline.
Table Name	Existing database tablename whose schema is to be fetched.
Connection Retries	Number of retries for component connection. The possible values are -1, 0 or positive number, where -1 signifies infinite retries.
Delay Between Connection Retries	Defines the retry delay intervals for component connection in millis.
Write Retry Count	Defines the retry count for component write failure.
Delay Between Write Retries	Defines the retry delay interval (milliseconds) for component retry count.
Query Timeout	Defines the query time out interval (in minutes) for component to save each batch data to redshift. Query will be aborted if query does not finish in the given timeout interval. Default -1 will be infinite query timeout.
Create Relational Tables	Check this option if you want to create relational tables with nested data. This option does not work when Save mode is Update.
Checkpoint Storage Location	DBFS checkpoint storage location is not supported when pipeline configure on EMR.
Checkpoint Connections	Check the connection. Connection are listed corresponding to selected storage location.
Override Credential	Check the option for overriding the user specific actions. Provide the username.
Check Point Directory	It is the HDFS path where the Spark application stores the checkpoint data.
Time-based Check Point	Select time-based check point on each pipeline run. In each pipeline run the above provided checkpoint location will be appended with current time in millis.
Output Mode	Output mode to be used while writing the data to Streaming emitter. Select the output mode from the below options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Trigger Type	Select one of the options available from the drop-down: - One-Time Micro-Batch - Fixed Interval Micro-Batches.
Priority	Priority defines the execution order of emitters.
ADD CONFIGURATION	Enables to configure additional custom properties.

Click on the Next button. Enter the notes in the space provided.

Click on the DONE button for saving the configuration.

If you have any feedback on Gathr documentation, please email us!

Advanced Redshift Emitter

Advanced Redshift Emitter Configuration #

Advanced Redshift Emitter Configuration