Redshift Data Source

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.

👉

This is a batch component.

Configuring RedShift Data Source

To add a Redshift Data Source into your pipeline, drag the Data Source to the canvas and right click on it to configure.

Under the Schema Type tab, select Fetch From Source or Upload Data File.

Field	Description
Message Type	Single: If only one type of message will arrive on the Data Source. Multi: If more than one type of message will arrive on the Data Source.
Message Name	Select the message you want to apply configuration on.

Field

Description

Message Type

Single: If only one type of message will arrive on the Data Source.

Multi: If more than one type of message will arrive on the Data Source.

Message Name

Select the message you want to apply configuration on.

Field	Description
Connection Name	Connections are the service identifiers. Select the connection name from the available list of connections, from where you would like to read the data.
Query	Write a valid query for RedShift
Enable Query Partitioning	Enable Redshift to read parallel data from a running query.
No. of Partitions	Specifies number of parallel threads to be invoked to read from RedShift in Spark.
Partition on Columns	Partitioning column is applied on column of Integer type. Spark performs partitioning to read data in parallel.
Lower Bound	Value of the lower bound for partitioning column
Upper Bound	Value of the upper bound for partitioning column

Click Done to save the configuration.

If you have any feedback on Gathr documentation, please email us!