Dataset Channel Data Source
In this article
This component is supported in Gathr on-premise. User have an option to utilize any existing dataset as a channel in the data pipeline.
On the Schema Type tab, update configuration for required parameters with reference to below table:
Field | Description |
---|---|
Select Dataset Type | Option to select a specific type of dataset (for example: HDFS, Hive, JDBC or S3) or all types of dataset to filter and narrow down the selection of required dataset in the subsequent step. |
Select Dataset & Version | Select any of the existing dataset and its version that you want to use as a channel in the pipeline. |
Max No of Rows | Maximum number of sample rows to pull from the Dataset Source. 100 by default. |
Trigger time for Sample | Minimum wait time before system fires a query to fetch data. Example: If it is 15 Secs, then system will first wait for 15 secs and will fetch all the rows available and create dataset out of it. 3 Seconds by default. |
Sampling Method | Dictates how to extract sample records from the complete data fetched from the source. Following are the ways: -Top N: Extract top n records using limit() on dataset. -Random Sample: Extract Random Sample records applying sample transformation and then limit to max number of rows in dataset. |
Once the Schema and Rules for the existing dataset are validated, click Next and go to the Incremental Read tab.
Incremental Read
Field | Description |
---|---|
Enable Incremental Read | Check this checkbox to enable incremental read support. |
Column to Check | Select a column on which incremental read will work. Displays the list of columns that has integer, long, date, timestamp, decimal types of values. |
Start Value | Mention a value of the reference column, only the records whose value of the reference column is greater than this value will be read. |
Read Control Type | Provides three options to control data to be fetched -None, Limit By Count, and Limit By Value. None: All the records with value of reference column greater than offset will be read. Limit By Count: Mentioned no. of records will be read with the value of reference column greater than offset will be read. Limit By Value: All the records with value of reference column greater than offset and less than Column Value field will be read. For None and Limit by count it is recommended that table should have data in sequential and sorted (increasing) order. |
Click NEXT to detect schema from the File Path file.
Click Done to save the configuration.
Configure Pre-Action in Source →.
If you have any feedback on Gathr documentation, please email us!