RDS Postgres Ingestion Source

RDS Postgres is Relational Database service on Cloud. RDS Postgres source can read in Batch from the RDS Postgres Database.

Data Source Configuration

Fetch From Source/Upload Data File

Gathr provides two options for reading RDS Postgres data. You can either choose to fetch data from RDS Postgres source directly by providing the data source connection details, or, upload a sample data file in one of the supported formats to see the schema details during ingestion application design time.

If Upload Data File is selected to fetch sample data, provide below details.

File Format: Depending on the type of data, select the sample file format (file type).

Gathr supported file formats for RDS Postgres data source are: CSV, JSON, TEXT, Parquet and ORC.

For CSV file format, select its corresponding delimiter.

Header Included: Enable this option to read the first row as a header in case if your RDS Postgres data is in CSV format.

Upload: Please upload the sample file as per the file format selected above.

👉

Make sure that the file size does not exceed 10 MB.

If Fetch From Source is selected, then continue to configure the data source.

Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for RDS Postgres earlier. Or create one as explained in the topic - RDS Connection →

Use the Test Connection option to make sure that the connection with RDS Postgres channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve issue before proceeding further.

Schema Name: Source Schema name for which the list of table will be viewed.

Table Name: Source table name to be selected for which you want to view the metadata.

Add Configuration: Additional properties can be added using this option as key-value pairs.

More Configurations

Query: Hive compatible SQL query to be executed in the component.

Design Time Query: Query used to fetch limited records during Application design. Used only during schema detection and inspection.

Enable Query Partitioning: This enables parallel reading of data from the table. It is disabled by default.

Tables will be partitioned if this check-box is enabled.

If Enable Query Partitioning is check marked, additional fields will be displayed as given below:

Type-in Partition Column: Select this option if the Partition Column list shown is empty or you do not see the required column in the list.

Partition on Column: This column will be used to partition the data. This has to be a numeric column, on which spark will perform partitioning to read data in parallel.

Data Type: In case if you have typed-in the partitioning column, you need to specify the data type of that column here.

Autodetect Bounds: Check this option to auto-detect the partition boundaries.

If Autodetect Bounds is check marked, additional fields will be displayed as given below:

Row count in Single Query: Enter the number of rows to be read in a single query.

Example: 10,000.

It implies that 10,000 records will be read in one partition.

Column have unique values: Check this column when the records are unique in the table.

👉

This option will only appear when the Data Type is Number and not when it is Date or Timestamp.

If Autodetect Bounds is disabled, then proceed by updating the following fields.

No. of Partitions: Specifies the number of parallel threads to be invoked to partition the table while reading the data.

Lower Bound: Value of the lower bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.

Upper Bound: Value of the upper bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.

If Enable Query Partitioning is disabled, then proceed by updating the following field.

Fetch Size: The fetch size determines the number of rows to be fetched per round trip. The default value is 1000.

Schema

Check the populated schema details. For more details, see Schema Preview →

Advanced Configuration

Optionally, you can enable incremental read. For more details, see RDS Incremental Configuration →

If you have any feedback on Gathr documentation, please email us!

RDS Postgres Ingestion Source

Data Source Configuration #

Schema #

Advanced Configuration #

Data Source Configuration

Schema

Advanced Configuration