Solr Emitter

Solr emitter allows you to store data in Solr indexes. Indexing is done to increase the speed and performance of search queries.   

Solr Emitter Configuration

To add a Solr emitter to your pipeline, drag it onto the canvas and connect it to a Data Source or processor. The configuration settings of the Solr emitter are as follows:

FieldDescription
Connection NameAll Solr connections are listed here. Select a connection for connecting to Solr.
Batch SizeIf user wants to index records in batch, for that the user has to specify batch size.
Ignore Missing ValuesIgnore or persist empty or null values of message fields in sink.
Across Field Search EnabledSpecifies if full text search is to be enabled across all fields.
Index Number of ShardsSpecifies number of shards to be created in index store.
Index Replication FactorSpecifies number of additional copies of data to be kept across nodes. Should be less than n-1, where n is the number of nodes in the cluster.
Index Expressionjsexpression is used to evaluate index name. For example: ’ns_Name’, the index will be created as ’ns_Name’ the index will be created as ’ns_Name’. Use field alias instead of field name in expression when you want to perform field based partitioning.
Routing RequiredThis specifies if custom dynamic routing is to be enabled. If enabled, a json of routing policy needs to be defined. If Routing Required = true, then: Routing Policy - A json defining the custom routing policy. Example: {“1”:{“company”:{“Google”:20.2,“Apple”:80.0}}}. Here, 1 is the timestamp after which the custom routing policy will be active, ‘company’ is the field name and the value ‘Google’ takes 20% shards and value ‘Apple’ takes 80% shards.
ID Generator Type

Enables to generate the ID field.

Following types of ID generators are available:

Key Based:

Key Fields: Select message field to be used as key.

Select:** Select all/id/sequence_number/File_id.

Note: Add key ‘incremental_fields’ and comma separated column names as values. This will work with a key based UUID

UUID: Universally unique identifier.

Custom: In this case, you can write your custom logic to create the ID field. For example, if you wish to use an UUID key but want to prefix it with “HSBC”, then you can write the logic in a java class.

If you select this option then an additional field - “Class Name” will be displayed on user interface where you need to mention the fully qualified class name of your Java class.

You can download the sample project from the “Data Pipeline” landing page and refer Java class com.yourcompany.custom.keygen.SampleKeyGenerator to write the custom code.

Enable TTL

Select TTL that limits the lifetime of the data.

TTL Type: Provide TTL type as either Static or Field Value.

TTL Value: Provide TTL value in seconds in case of static TTL type or integer field in case of Field Value.

Emitter Output FieldsFields of the output message.
Connection RetriesNumber of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries.
Delay Between Connection RetriesDefines the retry delay intervals for component connection in milliseconds.
Enable TTLCheck this option to limit the lifetime of data.
TTL TypeOptions available are: Static and Field Value.
TTL ValueIf the TTL Type is selected as Field value then Provide TTL Value. Provide field of integer or long type only. The value of selected field will be considered as TL value in seconds. Provide TTL value in seconds if TTL Type is selected as Static.
PriorityPriority defines the execution order for the emitters.
Checkpoint Storage LocationSelect the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint ConnectionsSelect the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-base CheckpointSelect checkbox to enable time-based checkpoint on each pipeline run
Output Mode

Output mode to be used while writing the data to Streaming sink.

Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. The complete output mode comes if aggregation processor is being used.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
Add Configuration

The user can add further configuration.

Note: Index_field and store_field support is there using Add Configuration.

Top