Advanced Redshift Emitter

Advanced Redshift emitter uses S3 temp directory to unload data into Redshift database table.

Advanced Redshift Emitter Configuration

To add an Advanced Redshift emitter to your pipeline, drag the emitter onto the canvas and connect it to a Data Source or processor. Right-click on the emitter to configure it as explained below:

FieldDescription
Connection NameAll Redshift connections will be listed here. Select a connection for connecting to Redshift
Override CredentialsUnchecked by default, check mark the checkbox to override credentials for user specific actions. Also provide database username and password.
S3 Connection NameAll S3 connections will be listed here.Select a connection for connection for S3 temp dir.
S3 Temp Directory

A writable location in Amazon S3, to be used for unloaded data when data to be loaded into Redshift when writing.

Example: s3n://myBucket/NYC, where myBucket is the S3 bucket name.

S3 Temp FormatOutput format in which data will be written in S3 temp directory.
Message NameMessage used in the pipeline.
Schema NameSchema used in the pipeline.
Table NameExisting database tablename whose schema is to be fetched.
Connection RetriesNumber of retries for component connection. The possible values are -1, 0 or positive number, where -1 signifies infinite retries.
Delay Between Connection RetriesDefines the retry delay intervals for component connection in millis.
Write Retry CountDefines the retry count for component write failure.
Delay Between Write RetriesDefines the retry delay interval (milliseconds) for component retry count.
Query TimeoutDefines the query time out interval (in minutes) for component to save each batch data to redshift. Query will be aborted if query does not finish in the given timeout interval. Default -1 will be infinite query timeout.
Create Relational TablesCheck this option if you want to create relational tables with nested data. This option does not work when Save mode is Update.
Checkpoint Storage LocationDBFS checkpoint storage location is not supported when pipeline configure on EMR.
Checkpoint ConnectionsCheck the connection. Connection are listed corresponding to selected storage location.
Override CredentialCheck the option for overriding the user specific actions. Provide the username.
Check Point DirectoryIt is the HDFS path where the Spark application stores the checkpoint data.
Time-based Check PointSelect time-based check point on each pipeline run. In each pipeline run the above provided checkpoint location will be appended with current time in millis.
Output Mode

Output mode to be used while writing the data to Streaming emitter.

Select the output mode from the below options:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
Trigger Type

Select one of the options available from the drop-down:

- One-Time Micro-Batch

- Fixed Interval Micro-Batches.

PriorityPriority defines the execution order of emitters.
ADD CONFIGURATIONEnables to configure additional custom properties.

Click on the Next button. Enter the notes in the space provided.

Click on the DONE button for saving the configuration.

Top