Redshift ETL Source
Redshift ETL Source connector enables you to extract data from Redshift tables.
You can perform transformation operations on the extracted data and load the transformed data into other destinations for further processing or analysis.
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Configure the data source parameters as explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Redshift earlier. Or create one as explained in the topic - Redshift Connection →
Schema Name
Specify the name of the schema in the Redshift database from which you want to ingest data.
The schema represents the logical structure that organizes tables, views, and other objects within the database.
Table Name
Select/provide the name of the table in the specified schema from which you want to ingest data.
Query
Provide a custom SQL query to select specific data from the table specified above.
Design Time Query
Query used to fetch limited records during Application design. Used only during schema detection and inspection.
Enable Query Partitioning
This enables parallel reading of data from the table. It is disabled by default.
Tables will be partitioned if this check-box is enabled.
If Enable Query Partitioning is check marked, additional fields will be displayed as given below:
No. of Partitions
Specifies the number of parallel threads to be invoked to partition the table while reading the data.
Partition on Column
This column will be used to partition the data. This has to be a numeric column, on which spark will perform partitioning to read data in parallel.
Lower Bound
Value of the lower bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.
Upper Bound
Value of the upper bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.
If Enable Query Partitioning is disabled, then proceed by updating the following field.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Incremental Read
Optionally, you can enable incremental read. For more details, see Redshift Incremental Configuration →)
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →)
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!