JDBC Data Source
In this article
JDBC Custom Channel supports Clickhouse, ES and other databases that have JDBC connection. JDBC Channel supports Oracle, Postgres, MYSQL, MSSQL, DB2 connections. You can configure and test the connection with JDBC. This connection will allow you to extract the data from Relational DB sources into pipeline in batches after configuring JDBC channel.
Prerequisite: Upload the appropriate driver jar as per the RDBMS used in JDBC Data Source. Use the upload jar option.
For using DB2, create a successful DB2 Connection.
Configuring JDBC Data Source
To add a JDBC into your pipeline, drag the Data Source to the canvas and click on it to configure.
Under the Schema Type tab, select Fetch From Source, or Upload Data File, or use an existing dataset.
Field | Description |
---|---|
Connection Name | Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for JDBC Custom earlier. |
Use the Test Connection option to ensure that the connection with the JDBC Custom channel is established successfully.
A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.
Override Credentials | Check this option to override credentials for user specific options. |
Username | Provide username that has access to the database. |
Password | Provide database user password. |
Schema Name | Source Schema name for which the list of table will be viewed. This field will only be visible for databases that require schema to be selected. |
Table Name | Source table name to be selected for which you want to view the metadata. |
Query | Hive compatible SQL query to be executed in the component. In case if a connection with Custom JDBC option is selected, this query will serve as a sub-query for the inspect query specified in the custom JDBC connection configuration. |
Design Time Query | Query used to fetch limited records during Application design. Used only during schema detection and inspection. This field should be ignored in case of custom JDBC connection. |
Enable Query Partitioning | This enables parallel reading of data from the table. It is disabled by default. Tables will be partitioned if this check-box is enabled. |
If query partitioning is enabled, additional fields will be displayed as described below:
Type-in Partition Column | Select this option if the Partition Column list shown is empty or you do not see the required column in the list. Upon checking this option, the Data Type field will appear. |
Partition on Column | This column will be used to partition the data. This has to be a numeric column, on which spark will perform partitioning to read data in parallel. |
Data Type | In case if you have typed-in the partitioning column, you need to specify the data type of that column here. |
Autodetect Bounds | In case if you have typed-in the partitioning column, you need to specify the data type of that column here. |
No. of Partitions | No of threads that will be launched to partition the table while reading data. |
Lower Bound | Value of the lower bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values. |
Upper Bound | Value of the upper bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values. |
Fetch Size | The fetch size determines the number of rows to be fetched per round trip. The default value is 1000. |
Add configuration | Additional properties can be added using Add Configuration link. |
Schema | Check the populated schema details. |
Advanced Configuration | Optionally, you can enable incremental read. |
Metadata
Select table. You can view the Metadata of the tables.
Field | Description |
---|---|
Table | Select table of which you want to view Metadata. |
Column Name | Name of the column generated from the table. |
Column Type | Type of the column, for example: Text, Int |
Nullable | If the value of the column could be Nullable or not. |
Once the Metadata is selected, Click Next and detect schema to generate the output with Sample Values. The next tab is Incremental Read.
Incremental Read
Enter the schema and select table. You can view the Metadata of the tables.
Field | Description |
---|---|
Enable Incremental Read | Check this check-box to enable incremental read support. |
Column to Check | Select a column on which incremental read will work. Displays the list of columns that has integer, long, date, timestamp, decimal types of values. |
Type-in Details | Check this option if column list shown is empty or you do not see the required column in the list. |
Start Value | Mention a value of the reference column, only the records whose value of the reference column is greater than this value will be read. |
Override Query | Select this option to override the Incremental Read query. This allows you to provide an optimal query. |
Incremental Read Query | This section shows the query to fetch the incremental data. If Override Query is selected, you can change this query and provide an optimal query. |
Read Control Type: | Not available in case of Custom JDBC source. Provides three options to control data to be fetched -None, Limit By Count, and Maximum Value. None: All the records with value of reference column greater than offset will be read. Limit By Count: Mentioned no. of records will be read with the value of reference column greater than offset will be read. Maximum Value: All the records with value of reference column greater than offset and less than Column Value field will be read. For None and Limit by count it is recommended that table should have data in sequential and sorted (increasing) order. |
Click Done to save the configuration.
Configure Pre-Action in Source →
If you have any feedback on Gathr documentation, please email us!