SFTP Data Asset Source
Create a Data Asset Through SFTP
To create a data asset through SFTP Source, configure parameters as follows:
Connection Name
Connections are the service identifiers.
A connection name can be selected from the list if you have created and saved connection details for SFTP earlier.
Or create one as explained in the topic - SFTP Connection →
File Path
File path of the SFTP file system is to be given.
The wildcards, asterisk (*) and question mark (?) are also supported.
Wildcards can be provided in either folder or file for pattern matching.
Use question mark (?) as a wildcard to search for a single character and an asterisk (*) as a wildcard for any number of characters.
Example: The query /folder/?ink will fetch files from the folders named pink, sink, wink, etc.
Whereas, the query /folder/bird* will fetch files from the folders named bird, birding, birds, and other folders that start with bird.
File Format
Choose the appropriate format of the sample data to be fetched from SFTP, supporting CSV, JSON, TEXT, Parquet, or, ORC file types.
For CSV files, you can additionally specify a delimiter.
Is Header Included
For CSV file formats, indicate whether the source file includes a header row.
Enabling this option assists in proper data alignment by recognizing the header row as column labels.
This ensures accurate representation and manipulation of your data within the system.
Maximum Records to Fetch
Specify the maximum number of sample records you wish to keep in the data asset.
This feature helps in obtaining a manageable subset of data for testing and design purposes, facilitating efficient application development while optimizing resource usage.
Sampling Method
This option offers flexibility in how you retrieve sample data.
Following are the ways:
Top N: Retrieve the specified number of initial records from the data source based on the specified maximum number of rows. This is particularly useful when you want to analyze or design with a specific set of initial records.
Random Sample: Fetch a random subset of records from your sample data, ensuring a diverse representation. This approach is valuable when you require a more comprehensive assessment of your data’s characteristics.
More Configurations
Expand the More Configurations option to see the additional configuration parameters.
Parallelism
Option to provide the required number of multiple SFTP threads to be launched in parallel for greater download speed. Default value is given as 4.
Is Compressed
Check mark if the source files are compressed. (For example, in *.zip, *.tar or *.tar.gz formats)
If you have any feedback on Gathr documentation, please email us!