Redshift Ingestion Source

The Redshift Ingestion Source connector enables users to ingest data from external sources into Amazon Redshift for storage and analysis.

Data Source Configuration

Configure the data source parameters as explained below.

Fetch From Source/Upload Data File

To design the application, you can either fetch the sample data from the Redshift source by providing the data source connection details or upload a sample data file in one of the supported formats to see the schema details during the application design phase.

Upload Data File

To design the application, please upload a data file containing sample records in a format supported by Gathr.

The sample data provided for application design should match the data source schema from which data will be fetched during runtime.

If Upload Data File method is selected to design the application, provide the below details.

File Format

Select the format of the sample file depending on the file type.

Gathr-supported file formats for Redshift data sources are CSV, JSON, TEXT, Parquet and ORC.

For CSV file format, select its corresponding delimiter.

Header Included

Enable this option to read the first row as a header if your Redshift sample data file is in CSV format.

Upload

Please upload the sample file as per the file format selected above.


Fetch From Source

If Fetch From Source method is selected to design the application, then the data source connection details will be used to get sample data.

Continue to configure the data source.


Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Redshift earlier. Or create one as explained in the topic - Redshift Connection →

Use the Test Connection option to ensure that the connection with the Redshift channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.


Schema Name

Specify the name of the schema in the Redshift database from which you want to ingest data.

The schema represents the logical structure that organizes tables, views, and other objects within the database.


Table Name

Select/provide the name of the table in the specified schema from which you want to ingest data.


Add Configuration: Additional properties can be added using this option as key-value pairs.

More Configurations

This section contains additional configuration parameters.

Query

Provide a custom SQL query to select specific data from the table specified above.


Design Time Query

Query used to fetch limited records during Application design. Used only during schema detection and inspection.


Enable Query Partitioning

This enables parallel reading of data from the table. It is disabled by default.

Tables will be partitioned if this check-box is enabled.

If Enable Query Partitioning is check marked, additional fields will be displayed as given below:

No. of Partitions

Specifies the number of parallel threads to be invoked to partition the table while reading the data.

Partition on Column

This column will be used to partition the data. This has to be a numeric column, on which spark will perform partitioning to read data in parallel.

Lower Bound

Value of the lower bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.

Upper Bound

Value of the upper bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.

If Enable Query Partitioning is disabled, then proceed by updating the following field.


Schema

Check the populated schema details. For more details, see Schema Preview →

Advanced Configuration

Optionally, you can enable incremental read. For more details, see Redshift Incremental Configuration →

Top