HDFS ETL Source

An ETL application with HDFS data source is supported to run on registered clusters and not on Gathr clusters.

To know about how to register a cluster with Gathr by establishing PrivateLink, see Compute Setup →

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.

Data Source Configuration

Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for HDFS earlier. Or create one as explained in the topic - HDFS Connection →

HDFS file path: File path of the HDFS file system should be provided.

Detect Schema

Check the populated schema details. For more details, see Schema Preview →

Incremental Read

Optionally, you can enable incremental read. For more details, see HDFS Incremental Configuration →

Pre Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →).

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top