HBase Emitter

HBase emitter stores streaming data into HBase. It provides quick random access to huge amount of structured data.

HBase Emitter Configuration

To add HBase emitter to your pipeline, drag it onto the canvas, connect it to a Data Source or processor, and right-click on it to configure.

FieldDescription
Connection NameAll HBase connections will be listed here. Select a connection for connecting to HBase.
Batch SizeIf user wants to index records in batch, for that the user has to specify batch size.
Table Name Expression

Javascript expression used to evaluate table name.

The keyspace will be formed as ns_+{tenanatId}. For example, ns_1

Compression

Provides the facility to compress the message before storing it. The algorithm used is Snappy.

When selected true, enables compression on data

Region Splitting Definition

This functionality defines how the HBase tables should be pre-split. The default value is ‘No pre-split’. The supported options are:

Default: No Pre-Split- Only one region will be created initially.

Based on Region Boundaries: Regions are created based on given key boundaries. For example, if your key is a hexadecimal key and you provide a value ‘4, 8, d’, it will create four regions as follows:

1st region for keys less than 4.

2nd region for keys greater than 4 and less than 8.

3rd region for keys greater than 8 and less than d.

4th region for keys greater than d.

EncodingData encoding type either UTF-8 (base encoding) or BASE 64(64 bit encoding).
Row Key Generator Type

Enables to generate the custom row key.

Following type of key generators are available:

UUID: Universally unique identifier.

Key Based: In this case, key is generated by appending the values of selected fields.

An additional field – “Key Fields” will be displayed where you can select the keys you want to combine. The keys will be appended in the same order as selected on the user interface.

Custom: Write your custom logic to create the row key. For example, if you want to use an UUID key but want to prefix it with HSBC, then you can write the logic in a Java class.

If you select this option then an additional field - “Class Name” will be displayed on UI where you need to mention the fully qualified class name of your Java class. You can download the sample project from the “Data Pipeline” landing page and refer Java class “com.yourcompany.custom.keygen.SampleKeyGenerator” to write the custom code.

Column FamilySpecify the name of column family that will be used while saving your data in a HBase table.
Emitter Output FieldsSelect the emitter out put fields.
Output FieldsFields in the message that needs to be a part of the output message.
ReplicationEnables to copy your data on underlying Hadoop file system. For example, if you specify “2” as Replication, then two copies will be created on HDFS
Ignore Missing Values

Ignore or persist empty or null values of message fields in emitter.

When selected true, ignores null value of message fields.

Connection RetriesThe number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries.
Delay Between Connection RetriesDefines the retry delay intervals for component connection in millis.
Enable TTLSpecifies the life time of a record. When selected, record will persist for that time duration which you specify in TTL field text box.
TTL TypeSpecify the TTL type as Static or Field Value.
TTL ValueProvide TTL value in seconds in case of static TTL type or integer field in case of Field Value.
Checkpoint Storage LocationSelect the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint ConnectionsSelect the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/ , checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-Based Check PointSelect checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Output Mode

Output mode to be used while writing the data to Streaming sink.

Select the output mode from the given three options:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
ADD CONFIGURATION

Enables to configure additional properties.

Click on the Next button. Enter the notes in the space provided.

Click on the DONE button for saving the configuration.

Top