HDFS ETL Source
An ETL application with HDFS data source is supported to run on registered clusters and not on Gathr clusters.
To know about how to register a cluster with Gathr by establishing PrivateLink, see Compute Setup →
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for HDFS earlier. Or create one as explained in the topic - HDFS Connection →
HDFS file path: File path of the HDFS file system should be provided.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Incremental Read
Optionally, you can enable incremental read. For more details, see HDFS Incremental Configuration →
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →).
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!