Solr Emitter

Solr emitter allows you to store data in Solr indexes. Indexing is done to increase the speed and performance of search queries.

Solr Emitter Configuration

To add a Solr emitter to your pipeline, drag it onto the canvas and connect it to a Data Source or processor. The configuration settings of the Solr emitter are as follows:

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	All Solr connections are listed here. Select a connection for connecting to Solr.
Batch Size	If user wants to index records in batch, for that the user has to specify batch size.
KeySpace	Define a new or existing keyspace or its replication strategy.
Across Field Search Enabled	Specifies if full text search is to be enabled across all fields.
Index Number of Shards	Specifies number of shards to be created in index store.
Index Replication Factor	Specifies number of additional copies of data to be kept across nodes. Should be less than n-1, where n is the number of nodes in the cluster.
Index Expression	The MVEL Expression is used to evaluate the index name. This can help you leverage field based partitioning. For example consider the expression below: @{’ns_1_myindex’ + Math.round(.timestamp 36001000))}* A new index will be created within one-hour time range, and data will be dynamically indexed based on fields whose field alias name is ’timestamp’.
Routing Required	This specifies if custom dynamic routing is to be enabled. If enabled, a json of routing policy needs to be defined.
ID Generator Type	Enables to generate the ID field. Following types of ID generators are available: Key Based: Key Fields: Select message field to be used as key. Select: Select all/id/sequence_number/File_id. Note: Add key ‘incremental_fields’ and comma separated column names as values. This will work with a key based UUID UUID: Universally unique identifier. Custom: In this case, you can write your custom logic to create the ID field. For example, if you wish to use an UUID key but want to prefix it with “HSBC”, then you can write the logic in a java class. If you select this option then an additional field - “Class Name” will be displayed on user interface where you need to mention the fully qualified class name of your Java class. You can download the sample project from the “Data Pipeline” landing page and refer Java class com.yourcompany.custom.keygen.SampleKeyGenerator** to write the custom code.
Enable TTL	Select TTL that limits the lifetime of the data. TTL Type: Provide TTL type as either Static or Field Value. TTL Value: Provide TTL value in seconds in case of static TTL type or integer field in case of Field Value.
Output Fields	Fields of the output message.
Ignore Missing Values	Ignore or persist empty or null values of message fields in sink.
Connection Retries	Number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries. If Routing Required =true, then: Routing Policy - A json defining the custom routing policy. Example: {“1”:{“company”:{“Google”:20.0,“Apple”:80.0}}} Here 1 is the timestamp after which custom routing policy will be active, ‘company’ is the field name and the value ‘Google’ takes 20% shards and value ‘Apple’ takes 80% shards.
Delay Between Connection Retries	Defines the retry delay intervals for component connection in milliseconds.
Enable TTL	When selected, data will be discarded to TTL exchange specified.
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-base Checkpoint	Select checkbox to enable time-based checkpoint on each pipeline run
Output Mode	Output mode to be used while writing the data to Streaming sink. Append: Output Mode in which only the new rows in the streaming data will be written to the sink. Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Add Configuration	The user can add further configuration. Note: Index_field and store_field support is there using Add Configuration.

If you have any feedback on Gathr documentation, please email us!

Solr Emitter

Solr Emitter Configuration #

Solr Emitter Configuration