JDBC Data Source

JDBC Custom Channel supports Clickhouse, ES and other databases that have JDBC connection. JDBC Channel supports Oracle, Postgres, MYSQL, MSSQL, DB2 connections. You can configure and test the connection with JDBC. This connection will allow you to extract the data from Relational DB sources into pipeline in batches after configuring JDBC channel.

Prerequisite: Upload the appropriate driver jar as per the RDBMS used in JDBC Data Source. Use the upload jar option.

For using DB2, create a successful DB2 Connection.

Configuring JDBC Data Source

To add a JDBC into your pipeline, drag the Data Source to the canvas and click on it to configure.

Under the Schema Type tab, select Fetch From Source, or Upload Data File, or use an existing dataset.

FieldDescription
Connection NameConnections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for JDBC Custom earlier.

Use the Test Connection option to ensure that the connection with the JDBC Custom channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Override CredentialsCheck this option to override credentials for user specific options.
UsernameProvide username that has access to the database.
PasswordProvide database user password.
Schema Name

Source Schema name for which the list of table will be viewed.

This field will only be visible for databases that require schema to be selected.

Table NameSource table name to be selected for which you want to view the metadata.
Query

Hive compatible SQL query to be executed in the component.

In case if a connection with Custom JDBC option is selected, this query will serve as a sub-query for the inspect query specified in the custom JDBC connection configuration.

Design Time Query

Query used to fetch limited records during Application design. Used only during schema detection and inspection.

This field should be ignored in case of custom JDBC connection.

Enable Query PartitioningThis enables parallel reading of data from the table. It is disabled by default. Tables will be partitioned if this check-box is enabled.

If query partitioning is enabled, additional fields will be displayed as described below:

Type-in Partition ColumnSelect this option if the Partition Column list shown is empty or you do not see the required column in the list. Upon checking this option, the Data Type field will appear.
Partition on ColumnThis column will be used to partition the data. This has to be a numeric column, on which spark will perform partitioning to read data in parallel.
Data TypeIn case if you have typed-in the partitioning column, you need to specify the data type of that column here.
Autodetect BoundsIn case if you have typed-in the partitioning column, you need to specify the data type of that column here.
No. of PartitionsNo of threads that will be launched to partition the table while reading data.
Lower BoundValue of the lower bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.
Upper BoundValue of the upper bound for partitioning column. This value will be used to decide the partition boundaries. The entire dataset will be distributed into multiple chunks depending on the values.
Fetch SizeThe fetch size determines the number of rows to be fetched per round trip. The default value is 1000.
Add configurationAdditional properties can be added using Add Configuration link.
SchemaCheck the populated schema details.
Advanced ConfigurationOptionally, you can enable incremental read.

Metadata

Select table. You can view the Metadata of the tables.

FieldDescription
TableSelect table of which you want to view Metadata. 
Column NameName of the column generated from the table.
Column TypeType of the column, for example: Text, Int
NullableIf the value of the column could be Nullable or not.

Once the Metadata is selected, Click Next and detect schema to generate the output with Sample Values. The next tab is Incremental Read.

Incremental Read

Enter the schema and select table. You can view the Metadata of the tables.

FieldDescription
Enable Incremental ReadCheck this check-box to enable incremental read support.
Column to CheckSelect a column on which incremental read will work. Displays the list of columns that has integer, long, date, timestamp, decimal types of values.
Type-in DetailsCheck this option if column list shown is empty or you do not see the required column in the list.
Start ValueMention a value of the reference column, only the records whose value of the reference column is greater than this value will be read.
Override QuerySelect this option to override the Incremental Read query. This allows you to provide an optimal query.
Incremental Read QueryThis section shows the query to fetch the incremental data. If Override Query is selected, you can change this query and provide an optimal query. keyword is a placeholder which will be replaced with an actual value at runtime.
Read Control Type:

Provides three options to control data to be fetched -None, Limit By Count, and Maximum Value.

None: All the records with value of reference column greater than offset will be read.

Limit By Count: Mentioned no. of records will be read with the value of reference column greater than offset will be read.

Maximum Value: All the records with value of reference column greater than offset and less than Column Value field will be read.

For None and Limit by count it is recommended that table should have data in sequential and sorted (increasing) order.

Click Done to save the configuration.

Configure Pre-Action in Source →

Top