HBase ETL Source

💡

HBase connector is available on request to Gathr users.

See the Connector Marketplace topic. Please request your administrator to start a trial or subscribe to the Premium HBase connector.

In Gathr, it can be added as a channel to help in fetching customers’ and prospects’ data and transform it as needed before storing it in a desired data warehouse to run further analytics.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.

Data Source Configuration

Configure the data source parameters as explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for HBase earlier. Or create one as explained in the topic - HBase Connection →

Use the Test Connection option to ensure that the connection with the HBase channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Schema Name

Schema names will list as per the configured connection.

Select the schema to read from.

Entity

Tables in HBase are statically defined to model HBase entities.

If you selected the Fetch From Source method to design the application, the Entities will list as per the configured connection. Select the entity to be read from HBase.

If you selected the Upload Data File method to design the application, the exact name of the entity should be provided to read the data from HBase.

If you selected the Fetch From Source method to design the application, the Fields would list as per the Entity chosen in the previous configuration parameter. Select the fields or provide a custom query to read the desired records from HBase.

Fields

The conditions to fetch source data from a HBase table can be specified using this option.

Select Fields: Select the column(s) of the entity that should be read.

Custom Query: Provide an SQL query specifying the read conditions for the source data.

Example: SELECT "Id" FROM Companies

If you selected the Upload Data File method to design the application, provide a custom query to fetch records from the HBase entity specified in the previous configuration.

Query

The conditions to fetch source data from a HBase table can be specified using this option.

Provide an SQL query specifying the read conditions for the source data.

Example: SELECT "Id" FROM Companies

Read Options

This section contains additional read options.

Datetime Format

This can be used to automatically format any datetime values entering a database.

Default Value: “yyyy-MM-dd’T’HH:mm:ss.fffzzz”

Expose Column Timestamp

Specifies whether to expose the cell timestamp as a separate column when executing the SELECT Statement.

The column name consists of the name of the cell and a fixed suffix.

Include Disabled Tables

Specifies whether to retrieve disabled tables.

Reset Table Metadata On Delete

Specifies whether to reset the cache table after executing the DELETE Statement.

Retrieve Selected Columns Only

Specifies whether to retrieve selected columns only when executing a SELECT statement.

This enables all values to be returned without HBase filtering out rows containing empty columns that were selected.

When set to ‘True’, the connector will only retrieve the selected columns from HBase. Note in this scenario, HBase will not return rows where a selected column is empty. In such a case, this can cause a different number of rows to be returned for various queries. This setting can improve performance but it should only be used when you know that there are no empty values for the selected columns or in the case that you don’t want these rows to be returned.

Use SQL Filtering

Specifies whether to use the ScannerFilter when executing the SELECT Statement.

If UseSQLFiltering = false, only the client filter will be used.

Page size

The maximum number of results to return per page from Apache HBase.

The Pagesize property affects the maximum number of results to return per page from Apache HBase. Setting a higher value may result in better performance at the cost of additional memory allocated per page consumed.

Partitioning

This section contains partitioning-related configuration parameters.

Enable Partitioning

This enables parallel reading of the data from the entity.

Partitioning is disabled by default.

If enabled, an additional option will appear to configure the partitioning conditions.

Column

The selected column will be used to partition the data.

Max Rows per Partition: Enter the maximum number of rows to be read in a single request.

Example: 10,000

It implies that a maximum number of 10,000 rows can be read in one partition.

Advanced Configuration

This section contains additional configuration parameters.

Fetch Size

The number of rows to be fetched per round trip. The default value is 1000.

Add Configuration: Additional properties can be added using this option as key-value pairs.

Detect Schema

Check the populated schema details. For more details, see Schema Preview →

Pre Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →.

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

If you have any feedback on Gathr documentation, please email us!

HBase ETL Source

Schema Type #

Data Source Configuration #

Connection Name #

Schema Name #

Entity #

Fields #

Query #

Read Options #

Datetime Format #

Expose Column Timestamp #

Include Disabled Tables #

Reset Table Metadata On Delete #

Retrieve Selected Columns Only #

Use SQL Filtering #

Page size #

Partitioning #

Enable Partitioning #

Column #

Advanced Configuration #

Fetch Size #

Detect Schema #

Pre Action #

Notes #