Redshift Data Source

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.

Configuring RedShift Data Source

To add a Redshift Data Source into your pipeline, drag the Data Source to the canvas and right click on it to configure.

Under the Schema Type tab, select Fetch From Source or Upload Data File.

FieldDescription
Message Type

Single: If only one type of message will arrive on the Data Source.

Multi: If more than one type of message will arrive on the Data Source.

Message NameSelect the message you want to apply configuration on.
FieldDescription
Connection Name

Connections are the service identifiers.

Select the connection name from the available list of connections, from where you would like to read the data.

QueryWrite a valid query for RedShift
Enable Query PartitioningEnable Redshift to read parallel data from a running query.
No. of PartitionsSpecifies number of parallel threads to be invoked to read from RedShift in Spark.
Partition on ColumnsPartitioning column is applied on column of Integer type. Spark performs partitioning to read data in parallel.
Lower BoundValue of the lower bound for partitioning column
Upper BoundValue of the upper bound for partitioning column

Click Done to save the configuration.

Configure Pre-Action in Source →

Top