Hive Data Source

To use a Hive Data Source, select the connection and specify a warehouse directory path.

👉

This is a batch component.

To add a Hive Data Source into your pipeline, drag the Data Source to the canvas and right-click on it to configure.

Under the Detect Schema Type tab, select Fetch From Source or Upload Data File.

Configuring Hive Data Source

Field	Description
Message Type	Single: If only one type of message will arrive on the Data Source. Multi: If more than one type of message will arrive on the Data Source.
Message Type	Select the message you want to apply configuration on.

Field

Description

Message Type

Single: If only one type of message will arrive on the Data Source.

Multi: If more than one type of message will arrive on the Data Source.

Message Type

Select the message you want to apply configuration on.

Field	Description
Connection Name	Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data.
Query	Write a custom query for Hive.
Refresh Table Metadata	Spark hive caches the parquet table metadata and partition information to increase performance. It allows you to have an option to refresh table cache, to get the latest information during inspect. Also, this feature helps the most when there are multiple update and fetch events, in the inspect session. Refresh Table option also repairs and sync partitioned values into Hive metastore. This allows to process the latest value while fetching data during inspect or run.
Table Names	User can specify single or multiple table names to be refreshed.

After the query, Describe Table and corresponding Table Metadata, Partition Information, Serialize and Reserialize Information is populated.

Make sure that the query you run matches with the schema created with Upload data or Fetch from Source.

Click Done to save the configuration.

If you have any feedback on Gathr documentation, please email us!