ADLS Data Source - Batch and Streaming

Add an ADLS batch or streaming data source to create a pipeline. Click the component to configure it.

Under the Schema Type tab, select Fetch From Source, Upload Data File or Use Existing Dataset option. Edit the schema if required and click next to configure.

On the ADLS channel, you will be able to read data with formats including JSON, CSV, TEXT, XML, Fixed Length, Binary, AVRO, Parquet, ORC.

👉

Under the Fetch From Source tab, if the Type of Data is selected as Avro, then in the Provide Schema field, the below options are available:
- Infer from Data
- Inline Avro Schema
- Upload Avro Schema

The options to Provide Schema field are also available of Upload Data File option under the Schema Type tab.

Field	Description
Connection Name	Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data.
Override Credentials	Select the override credentials option check-box for overriding the credentials.
Authentication Type	Azure ADLS autentication type.
Account Name	Provide a valid Azure ADLS account name.
Account Key	Provide a valid account key. You can also test connection by clicking at the TEST CONNECTION button.
Container	Provide connection name in Azure Blob storage.
ADLS Directory Path	Provide directory path for ADLS file system. 👉 User has an option to configure ADLS source with supported compressed data files.
File Filter	Provide a file pattern. File filter is used to only include files with file names matching the pattern. For e.g .pdf or emp *.csv
Recursive File Lookup	Check the option to retrieve the files from current/sub-folder(s).
ADD CONFIGURATIONS	User can add further configurations (Optional).
Environment Params	User can add further environment parameters. (Optional)

👉

File Filter and recusursive file lookup will be available when the binary format is selected.

Provide the below fields to configure ADLS data source:

Click Next for Incremental Read option.

👉

The incremental Read option is available only for ADLS Batch.

Field	Description
Enable Incremental Read	Unchecked by default, check mark this option to enable incremental read support.
Read By	Option to read data incrementally either by choosing the File Modification Time option or Column Partition option.

Upon selecting the File Modification Time option, provide the below detail:


Offset	Specifies the last modified time of the file. 👉 The offset time must be lesser than the latest file modification time. Records with timestamp value greater than the specified datetime (in UTC) will be fetched. After each pipeline run the datetime configuration will set to the most recent timestamp value from the last fetched records. The given value should be in UTC with ISO Date format as yyyy-MM-dd’T’HH:mm:ss.SSSZZZ. Example: 2021-12-24T13:20:54.825+0000.

Offset

Specifies the last modified time of the file.

👉

The offset time must be lesser than the latest file modification time.

Records with timestamp value greater than the specified datetime (in UTC) will be fetched. After each pipeline run the datetime configuration will set to the most recent timestamp value from the last fetched records. The given value should be in UTC with ISO Date format as yyyy-MM-dd’T’HH:mm:ss.SSSZZZ. Example: 2021-12-24T13:20:54.825+0000.

Upon selecting the Column Partition option, provide the below details:


Read Control Type	Options to control data fetch: 👉 All records in reference column with values greater than the start value will be read. Limit by Value: All records in reference column with values greater than the start value but less than/equal to the max value that you set will be read. Limit by Incremental Size: All records in reference column with values greater than the start value with specified incremental size that you set will be selected. 👉 Upon selecting the Inclusive Start Offset checkbox, the Start value will be included with selected size.
Inclusive Start Offset	Check the checkbox for enabling the Inclusive Start Offset option to include the start value for incrementally reading the schema. Supports the integer, date and timestamp data types. 👉 In case if the Limit by Value option is selected as Read Control Type, then provide the Max value along with Start value of the selected column ID. Upon selecting the Inclusive Start Offset option the schema from Start value to the Max value will be incrementally read.

Read Control Type

Options to control data fetch:

👉

All records in reference column with values greater than the start value will be read.

Limit by Value: All records in reference column with values greater than the start value but less than/equal to the max value that you set will be read.

Limit by Incremental Size:

All records in reference column with values greater than the start value with specified incremental size that you set will be selected.

👉

Upon selecting the Inclusive Start Offset checkbox, the Start value will be included with selected size.

Inclusive Start Offset

Check the checkbox for enabling the Inclusive Start Offset option to include the start value for incrementally reading the schema. Supports the integer, date and timestamp data types.

👉

In case if the Limit by Value option is selected as Read Control Type, then provide the Max value along with Start value of the selected column ID.

Upon selecting the Inclusive Start Offset option the schema from Start value to the Max value will be incrementally read.

If you have any feedback on Gathr documentation, please email us!