Delta SQL Data Source
In this article
To add a DeltaSQL source into your pipeline, drag the data source to the canvas and click on it to configure.
Under the Schema Type tab, you can Upload Data File, Fetch From Source, Use Existing Dataset. Below are the configuration details of the DeltaSQL:
Field | Description |
---|---|
Source Storage Type | Select the source storage type option.source. |
Connection Name | Select the connection name from the available list of connections, from where you would like to read the data. DeltaSQL connection is required while pipeline creation. The actual pipeline execution will happen through metastore. |
Override Credentials | Unchecked by default, check the checkbox to override credentials for user specific actions. |
Username | Once the Override Credentials option is checked, provide the user name through which the Delta SQL service is running. |
Password | Provide the password for Delta SQL override credentials. |
Query | Provide delta compatible SQL query to be executed in the component. For HDFS source table name should be like delta. On the Pipeline Canvas, under the Inspect Session window, the Hive components will not work if the Interpret Delta SQL'S option is checked. |
Inspect Query | Provide the SQL query to be executed in the component with a set limit in records count for inspect and schema detection. - For inspect query field you need to provide simple query with table name like “select * from tablename” - “As of” clause is not supported for inspect query but can be used in query field. - Also for Inspect query it should be JDBC compatible" |
ADD CONFIGURATION | To add additional custom properties in key-value pairs. |
Environment Params | User can add further environment parameters. (Optional) |
Incremental Read
Field | Description |
---|---|
Enable Incremental Read | Check the radio button to enable the incremental option. The available options are: - None - Normal - Version |
Upon selecting None option the incremental read support will be disabled.
Upon selecting Normal, provide the inputs for the below fields:
Column to Check | Select a column on which incremental read will work. Displays the list of columns that has integer, long, date, timestamp, decimal types of values. |
Start Value | Mention a value of the reference column, only the records whose value of the reference column is greater than this value will be read. |
Read Control Type | Provides three options to control data to be fetched -None, Limit By Count, and Maximum Value. None: All the records with value of reference column greater than offset will be read. Limit By Count: Mentioned no. of records will be read with the value of reference column greater than offset will be read. Number of records to be read if Read Control Type is set as Limit by Count. If in the Column to Check field an integer value is selected then, provide integer value greater than 0 in the No. of Records field. Limit by Value: All the records with value of reference column greater than offset and less than Column Value field will be read. In the Column Value field, the incremental read will only be done within the set value for the selected column. Only those column records with values less than or equal to the set column value will be read. The Column Value field appears when the Read Control Type is set as Limit by Value. For None and Limit by count it is recommended that table should have data in sequential and sorted (increasing) order. |
Upon selecting Version, provide the inputs for the below fields:
Table Name | Option to provide delta table name for querying the records. |
Version | Option to provide version for querying the records. The records will be read subsequent to the selected version. |
ADD CONFIGURATION | To add additional custom properties in key-value pairs. |
Environment Params | User can add further environment parameters. (Optional) |
If you have any feedback on Gathr documentation, please email us!