Cloud SQL Batch ETL Source
Configure the Cloud SQL data source with below parameters.
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Configure the data source parameters that are explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Cloud SQL earlier. Or create one as explained in the topic - Cloud SQL Connection →
Schema Name
Select the schema from the provided list. All schemas are listed are from the selected connection name.
Table Name
Select the table from the provided list. All tables are listed are from the selected schema name.
Query
Hive compatible SQL query to be executed in the component.
Inspect Query
Provide a Hive compatible SQL query to be executed in the component with a limit in record count. This will be used only during application design.
Enable Query Partitioning
Tables will be partitioned and loaded in RDDs if this check-box is enabled. This enables parallel reading of data from the table.
No. of Partitions
Number of threads that will be launched to partition the table while reading data.
Partition on Column
This column will be used to partition the data. This must be a numeric column.
Lower Bound
This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending upon the values.
Upper Bound
This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending upon the values.
Add Configuration: To add additional custom ADLS properties in a key-value pair.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Incremental Read
Optionally, you can enable incremental read. For more details, see ADLS Incremental Configuration →.
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →.
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!