HDFS Data Source
This is a Batch component.
To add an HDFS Data Source to your pipeline, drag the Data Source to the canvas and right-click on it to configure.
The Schema Type tab allows you to create a schema and the fields. On the Detect Schema tab, select a Data Source or Upload Data.
If data is fetched from the source, and the type of data is CSV, the schema has an added tab, Is Header Included in the data source configuration.
Field | Description |
---|---|
Connection Name | Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data. |
HDFS file Path | HDFS path from where data is read. For Parquet, provide.parquet file path. |
Click on the Add Notes tab. Enter the notes in the space provided.
Click Done to save the configuration.
Configure Pre-Action in Source →
If you have any feedback on Gathr documentation, please email us!