HBase Emitter

HBase emitter stores streaming data into HBase. It provides quick random access to huge amount of structured data.

HBase Emitter Configuration

To add HBase emitter to your pipeline, drag it onto the canvas, connect it to a Data Source or processor, and right-click on it to configure.

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	All HBase connections will be listed here. Select a connection for connecting to HBase.
Batch Size	If user wants to index records in batch, for that the user has to specify batch size.
Table Name Expression	Javascript expression used to evaluate table name. The keyspace will be formed as ns_+{tenanatId}. For example, ns_1
Compression	Provides the facility to compress the message before storing it. The algorithm used is Snappy. When selected true, enables compression on data
Region Splitting Definition	This functionality defines how the HBase tables should be pre-split. The default value is ‘No pre-split’. The supported options are: Default: No Pre-Split- Only one region will be created initially. Based on Region Boundaries: Regions are created based on given key boundaries. For example, if your key is a hexadecimal key and you provide a value ‘4, 8, d’, it will create four regions as follows: 1st region for keys less than 4. 2nd region for keys greater than 4 and less than 8. 3rd region for keys greater than 8 and less than d. 4th region for keys greater than d.
Encoding	Data encoding type either UTF-8 (base encoding) or BASE 64(64 bit encoding).
Row Key Generator Type	Enables to generate the custom row key. Following type of key generators are available: UUID: Universally unique identifier. Key Based: In this case, key is generated by appending the values of selected fields. An additional field – “Key Fields” will be displayed where you can select the keys you want to combine. The keys will be appended in the same order as selected on the user interface. Custom: Write your custom logic to create the row key. For example, if you want to use an UUID key but want to prefix it with HSBC, then you can write the logic in a Java class. If you select this option then an additional field - “Class Name” will be displayed on UI where you need to mention the fully qualified class name of your Java class. You can download the sample project from the “Data Pipeline” landing page and refer Java class “com.yourcompany.custom.keygen.SampleKeyGenerator” to write the custom code.
Column Family	Specify the name of column family that will be used while saving your data in a HBase table.
Emitter Output Fields	Select the emitter out put fields.
Output Fields	Fields in the message that needs to be a part of the output message.
Replication	Enables to copy your data on underlying Hadoop file system. For example, if you specify “2” as Replication, then two copies will be created on HDFS
Ignore Missing Values	Ignore or persist empty or null values of message fields in emitter. When selected true, ignores null value of message fields.
Connection Retries	The number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries.
Delay Between Connection Retries	Defines the retry delay intervals for component connection in millis.
Enable TTL	Specifies the life time of a record. When selected, record will persist for that time duration which you specify in TTL field text box.
TTL Type	Specify the TTL type as Static or Field Value.
TTL Value	Provide TTL value in seconds in case of static TTL type or integer field in case of Field Value.
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/ , checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based Check Point	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Output Mode	Output mode to be used while writing the data to Streaming sink. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink. Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
ADD CONFIGURATION	Enables to configure additional properties. 👉 Index_field and store_field support is there using Add Configuration.

Click on the Next button. Enter the notes in the space provided.

Click on the DONE button for saving the configuration.

If you have any feedback on Gathr documentation, please email us!

HBase Emitter

HBase Emitter Configuration #

HBase Emitter Configuration