Redis Vector Emitter

Redis Vector Emitter allows you to store the vectorized data such as text passages, images, and videos.

Vectorizing means to map unstructured data to a flat sequence of numbers.

Redis Vector Emitter Configuration

Select Redis Vector emitter from the components list, add it to your pipeline and click to configure.

A vector database requires data in vector form. Please ensure that the data being passed to the vector database is in the correct vector format.

The configuration settings are as follows:

Connection Name

Select connection name out of the list of saved connections. Create one from Connections page as explained in the topic - Redis Connection.

Index

When emitting data, you can choose an existing index or create a new one. If you pick an existing index, your data will be written to it. Otherwise, you can create a new index with a unique name, and your processed data will be written to it during the application run.

Create Index: Select it to create a new index in the Redis Vector. Check the configuration details here.

Index Info

Details of the selected index is shown if an existing index is selected.

Column to Ingest

Id Column

Select ID column to create key which will be used to identify the data that it should index.

Columns New Metadata

This option will appear if an existing index is selected.

Add Metadata Object

Click the option to add metadata object(s).

Output Column

Provide or select the name of metadata attribute to be parsed.

Index Attribute

Provide or select a column name where the parsed metadata attribute should be captured.

Auto Fill

Auto-map the output column with the index attribute.

Create Index

Index Name

Provide index name that is to be created.

Index Type

Type of Redis Index. Currently supported index type is HASH.

Metric And Dimension

Select the distance metric that will be used to measure the degree of similarity between two vectors. Supported distance metric are L2, IP, COSINE. Select the vector dimension between 1 to 20000.

Doc Prefix

Provide the doc prefix of Redis Vector Index. This will be appended as prefix with ID column to create key which will be used to identify the data that it should index.

Index New Metadata

This option will appear when creating a new index.

Add Metadata Object

Click the option to add metadata object(s).

Output Column

Provide or select the name of metadata attribute to be parsed.

Index Attribute

Provide or select a column name where the parsed metadata attribute should be captured.

Datatype

Select the datatype of the metadata attribute to be stored. Available options are: Text, Numeric and Vector.

Auto Fill

Auto-map the output column with the index attribute.

Write Configuration

In case of Batch pipeline the below configurations will be available under Write Configuration:

Save Mode

Save mode specifies how to handle any existing data in the target.

Append: Input data will be appended to the existing index.

Truncate and Overwrite: This option will be enabled when user selects Create Index option. This will first truncate the given index (if index is present), recreate the index and then insert the data in the newly created index.

Batch Size

The number of rows to insert per request.

Add Configuration: Click the option to add more configuration(s) to the component.

Environment Params: Click the + ADD PARAM option to add more environment configuration(s).

In case of Streaming pipeline the below configurations will be available under Write Configuration:

Output Mode

Output mode specifies how to write the data. Available options are Append and Complete. The Output mode is to be used while writing the data to Streaming emitter.

Available options are: Append and Complete Mode

Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

👉

If Streaming datasource is used in a pipeline along with Aggregation without watermark then it is recommended not to use Append as output mode.

👉

If Streaming datasource used in pipeline is without Aggregation then it is recommended not to use Complete as output mode.

Checkpoint Storage Location

Select the checkpoint storage location. The available options are HDFS, S3 and GCS.

Checkpoint Connections

Select the connection. The connections are listed corrosponding to the selected storage location.

Override Credential

Select the checkbox to override credentials.

Username

Provide the username detail to override credentials if HDFS checkpoint storage location is selected.

AWS KeyId

Provide the AWS KeyId if S3/GCS checkpoint storage location is selected.

Secret Access Key

Provide the Secret Access key to authenticate the S3/GCS checkpoint storage location.

Test Connection

Option to test the established connection.

Check point Directory

It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3 and GCS, enter an absolute path like: S3://BucketName/checkpointingDir or gs://bucketName/checkpointDir

Time-based Check Point

Enable time based checkpoint on each pipeline run where the provided checkpoint location will be appended with current time in millis.

Enable Trigger

Option to enable a trigger to define the frequency at which a streaming query should be executed.

Trigger Type

Select one of the options available from the drop-down: One-Time Micro-Batch or Fixed Interval Micro-Batches.

One-Time Micro-Batch

Trigger that processes only a single batch of data in a streaming query and then terminates the query.

Fixed Interval Micro-Batches

A trigger policy that runs a query periodically based on an interval in processing time.

Processing Time

If the fixed interval micro-batches option is selected, then provide the processing time details in minutes/seconds.

ADD CONFIGURATION: Click the option to add more configuration(s) to the component.

Environment Params: Click the + ADD PARAM option to add more environment configuration(s).

If you have any feedback on Gathr documentation, please email us!

Redis Vector Emitter

Redis Vector Emitter Configuration #

Connection Name #

Index #

Index Info #

Column to Ingest #

Id Column #

Columns New Metadata #

Add Metadata Object #

Output Column #

Index Attribute #

Auto Fill #

Create Index #

Index Name #

Index Type #

Metric And Dimension #

Doc Prefix #

Index New Metadata #

Add Metadata Object #

Output Column #

Index Attribute #

Datatype #

Auto Fill #

Write Configuration #

Save Mode #

Batch Size #

Output Mode #

Checkpoint Storage Location #

Checkpoint Connections #

Override Credential #

Username #

AWS KeyId #

Secret Access Key #

Test Connection #

Check point Directory #

Time-based Check Point #

Enable Trigger #

Trigger Type #

One-Time Micro-Batch #

Fixed Interval Micro-Batches #

Processing Time #

Redis Vector Emitter Configuration

Connection Name

Index

Index Info

Column to Ingest

Id Column

Columns New Metadata

Add Metadata Object

Output Column

Index Attribute

Auto Fill

Create Index

Index Name

Index Type

Metric And Dimension

Doc Prefix

Index New Metadata

Add Metadata Object

Output Column

Index Attribute

Datatype

Auto Fill

Write Configuration

Save Mode

Batch Size

Output Mode

Checkpoint Storage Location

Checkpoint Connections

Override Credential

Username

AWS KeyId

Secret Access Key

Test Connection

Check point Directory

Time-based Check Point

Enable Trigger

Trigger Type

One-Time Micro-Batch

Fixed Interval Micro-Batches

Processing Time