Kudu Data Source

Apache Kudu is a column-oriented data store of the Apache Hadoop ecosystem. It enable fast analytics on fast (rapidly changing) data. The channel is engineered to take advantage of hardware and in-memory processing. It lowers query latency significantly from similar type of tools.

Configuring KUDU Data Source

To add a KUDU Data Source into your pipeline, drag the Data Source to the canvas and right-click on it to configure.

Under the Schema Type tab, select Fetch From Source or Upload Data File.

Field	Description
Connection Name	Connections are the service identifiers. Select the connection name from the available list of connections, from where you would like to read the data.
Table Name	Name of the table.
Add Configuration	To add additional properties in key-value pairs.

Field

Description

Connection Name

Connections are the service identifiers.

Select the connection name from the available list of connections, from where you would like to read the data.

Table Name

Name of the table.

Add Configuration

To add additional properties in key-value pairs.

Metadata

Enter the schema and select table. You can view the Metadata of the tables.

Field	Description
Table	Select table of which you want to view Metadata.
Column Name	Name of the column generated from the table.
Column Type	Type of the column, for example: Text, Int
Nullable	If the value of the column could be Nullable or not.

Once the Metadata is selected, Click Next and detect schema to generate the output with Sample Values. The next tab is Incremental Read.

Incremental Read

Enter the schema and select table. You can view the Metadata of the tables.

Field	Description
Enable Incremental Read	Check this check-box to enable incremental read support.
Column to Check	Select a column on which incremental read will work. Displays the list of columns that has integer, long, date, timestamp, decimal types of values.
Start Value	Mention a value of the reference column, only the records whose value of the reference column is greater than this value will be read.
Read Control Type	Provides three options to control data to be fetched -None, Limit By Count, and Maximum Value. None: All the records with value of reference column greater than offset will be read. Limit By Count: Mentioned no. of records will be read with the value of reference column greater than offset will be read. Maximum Value: All the records with value of reference column greater than offset and less than Column Value field will be read. For None and Limit by count it is recommended that table should have data in sequential and sorted (increasing) order.

Click on the Add Notes tab. Enter the notes in the space provided.

Click Done to save the configuration.

Configure Pre-Action in Source →

If you have any feedback on Gathr documentation, please email us!

Kudu Data Source

Configuring KUDU Data Source #

Incremental Read #

Configuring KUDU Data Source

Incremental Read