Redis Vector Emitter
Redis Vector Emitter allows you to store the vectorized data such as text passages, images, and videos.
Vectorizing means to map unstructured data to a flat sequence of numbers.
Redis Vector Emitter Configuration
Select Redis Vector emitter from the components list, add it to your pipeline and click to configure.
A vector database requires data in vector form. Please ensure that the data being passed to the vector database is in the correct vector format.
The configuration settings are as follows:
Connection Name
Select connection name out of the list of saved connections. Create one from Connections page as explained in the topic - Redis Connection.
Index
When emitting data, you can choose an existing index or create a new one. If you pick an existing index, your data will be written to it. Otherwise, you can create a new index with a unique name, and your processed data will be written to it during the application run.
Create Index: Select it to create a new index in the Redis Vector. Check the configuration details here.
Index Info
Details of the selected index is shown if an existing index is selected.
Column to Ingest
Id Column
Select ID column to create key which will be used to identify the data that it should index.
Columns New Metadata
This option will appear if an existing index is selected.
Add Metadata Object
Click the option to add metadata object(s).
Output Column
Provide or select the name of metadata attribute to be parsed.
Index Attribute
Provide or select a column name where the parsed metadata attribute should be captured.
Auto Fill
Auto-map the output column with the index attribute.
Create Index
Index Name
Provide index name that is to be created.
Index Type
Type of Redis Index. Currently supported index type is HASH.
Metric And Dimension
Select the distance metric that will be used to measure the degree of similarity between two vectors. Supported distance metric are L2, IP, COSINE. Select the vector dimension between 1 to 20000.
Doc Prefix
Provide the doc prefix of Redis Vector Index. This will be appended as prefix with ID column to create key which will be used to identify the data that it should index.
Index New Metadata
This option will appear when creating a new index.
Add Metadata Object
Click the option to add metadata object(s).
Output Column
Provide or select the name of metadata attribute to be parsed.
Index Attribute
Provide or select a column name where the parsed metadata attribute should be captured.
Datatype
Select the datatype of the metadata attribute to be stored. Available options are: Text, Numeric and Vector.
Auto Fill
Auto-map the output column with the index attribute.
Write Configuration
In case of Batch pipeline the below configurations will be available under Write Configuration:
Save Mode
Save mode specifies how to handle any existing data in the target.
Append: Input data will be appended to the existing index.
Truncate and Overwrite: This option will be enabled when user selects Create Index option. This will first truncate the given index (if index is present), recreate the index and then insert the data in the newly created index.
Batch Size
The number of rows to insert per request.
Add Configuration: Click the option to add more configuration(s) to the component.
Environment Params: Click the + ADD PARAM option to add more environment configuration(s).
In case of Streaming pipeline the below configurations will be available under Write Configuration:
Output Mode
Output mode specifies how to write the data. Available options are Append and Complete. The Output mode is to be used while writing the data to Streaming emitter.
Available options are: Append and Complete Mode
Append: Output Mode in which only the new rows in the streaming data will be written to the sink.
Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.
Checkpoint Storage Location
Select the checkpoint storage location. The available options are HDFS, S3 and GCS.
Checkpoint Connections
Select the connection. The connections are listed corrosponding to the selected storage location.
Override Credential
Select the checkbox to override credentials.
Username
Provide the username detail to override credentials if HDFS checkpoint storage location is selected.
AWS KeyId
Provide the AWS KeyId if S3/GCS checkpoint storage location is selected.
Secret Access Key
Provide the Secret Access key to authenticate the S3/GCS checkpoint storage location.
Test Connection
Option to test the established connection.
Check point Directory
It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3 and GCS, enter an absolute path like: S3://BucketName/checkpointingDir or gs://bucketName/checkpointDir
Time-based Check Point
Enable time based checkpoint on each pipeline run where the provided checkpoint location will be appended with current time in millis.
Enable Trigger
Option to enable a trigger to define the frequency at which a streaming query should be executed.
Trigger Type
Select one of the options available from the drop-down: One-Time Micro-Batch or Fixed Interval Micro-Batches.
One-Time Micro-Batch
Trigger that processes only a single batch of data in a streaming query and then terminates the query.
Fixed Interval Micro-Batches
A trigger policy that runs a query periodically based on an interval in processing time.
Processing Time
If the fixed interval micro-batches option is selected, then provide the processing time details in minutes/seconds.
ADD CONFIGURATION: Click the option to add more configuration(s) to the component.
Environment Params: Click the + ADD PARAM option to add more environment configuration(s).
If you have any feedback on Gathr documentation, please email us!