Redshift Data Source
In this article
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.
This is a batch component.
Configuring RedShift Data Source
To add a Redshift Data Source into your pipeline, drag the Data Source to the canvas and right click on it to configure.
Under the Schema Type tab, select Fetch From Source or Upload Data File.
Field | Description |
---|---|
Message Type | Single: If only one type of message will arrive on the Data Source. Multi: If more than one type of message will arrive on the Data Source. |
Message Name | Select the message you want to apply configuration on. |
Field | Description |
---|---|
Connection Name | Connections are the service identifiers. Select the connection name from the available list of connections, from where you would like to read the data. |
Query | Write a valid query for RedShift |
Enable Query Partitioning | Enable Redshift to read parallel data from a running query. |
No. of Partitions | Specifies number of parallel threads to be invoked to read from RedShift in Spark. |
Partition on Columns | Partitioning column is applied on column of Integer type. Spark performs partitioning to read data in parallel. |
Lower Bound | Value of the lower bound for partitioning column |
Upper Bound | Value of the upper bound for partitioning column |
Click Done to save the configuration.
Configure Pre-Action in Source →
If you have any feedback on Gathr documentation, please email us!