Solr Emitter

Solr emitter allows you to store data in Solr indexes. Indexing is done to increase the speed and performance of search queries.   

Solr Emitter Configuration

To add a Solr emitter to your pipeline, drag it onto the canvas and connect it to a Data Source or processor. The configuration settings of the Solr emitter are as follows:

FieldDescription
Connection NameAll Solr connections are listed here. Select a connection for connecting to Solr.
Batch SizeIf user wants to index records in batch, for that the user has to specify batch size.
KeySpaceDefine a new or existing keyspace or its replication strategy.
Across Field Search EnabledSpecifies if full text search is to be enabled across all fields.
Index Number of ShardsSpecifies number of shards to be created in index store.
Index Replication FactorSpecifies number of additional copies of data to be kept across nodes. Should be less than n-1, where n is the number of nodes in the cluster.
Index Expression

The MVEL Expression is used to evaluate the index name. This can help you leverage field based partitioning.

For example consider the expression below:

@{’ns_1_myindex’ + Math.round(.timestamp 3600*1000))}

A new index will be created within one-hour time range, and data will be dynamically indexed based on fields whose field alias name is ’timestamp’.

Routing RequiredThis specifies if custom dynamic routing is to be enabled. If enabled, a json of routing policy needs to be defined.
ID Generator Type

Enables to generate the ID field.

Following types of ID generators are available:

Key Based:

Key Fields: Select message field to be used as key.

Select:** Select all/id/sequence_number/File_id.

Note: Add key ‘incremental_fields’ and comma separated column names as values. This will work with a key based UUID

UUID: Universally unique identifier.

Custom: In this case, you can write your custom logic to create the ID field. For example, if you wish to use an UUID key but want to prefix it with “HSBC”, then you can write the logic in a java class.

If you select this option then an additional field - “Class Name” will be displayed on user interface where you need to mention the fully qualified class name of your Java class.

You can download the sample project from the “Data Pipeline” landing page and refer Java class com.yourcompany.custom.keygen.SampleKeyGenerator to write the custom code.

Enable TTL

Select TTL that limits the lifetime of the data.

TTL Type: Provide TTL type as either Static or Field Value.

TTL Value: Provide TTL value in seconds in case of static TTL type or integer field in case of Field Value.

Output FieldsFields of the output message.
Ignore Missing ValuesIgnore or persist empty or null values of message fields in sink.
Connection Retries

Number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries.

If Routing Required =true, then:

Routing Policy - A json defining the custom routing policy. Example: {“1”:{“company”:{“Google”:20.0,“Apple”:80.0}}}

Here 1 is the timestamp after which custom routing policy will be active, ‘company’ is the field name and the value ‘Google’ takes 20% shards and value ‘Apple’ takes 80% shards.

Delay Between Connection RetriesDefines the retry delay intervals for component connection in milliseconds.
Enable TTLWhen selected, data will be discarded to TTL exchange specified.
Checkpoint Storage LocationSelect the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint ConnectionsSelect the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-base CheckpointSelect checkbox to enable time-based checkpoint on each pipeline run
Output Mode

Output mode to be used while writing the data to Streaming sink.

Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
Add Configuration

The user can add further configuration.

Note: Index_field and store_field support is there using Add Configuration.

Top