Pinecone ETL Target

Pinecone ETL Target allows you to emit and manage data from your Gathr application to Pinecone, leveraging the simplicity and performance of Pinecone’s vector database for AI applications.

Target Configuration

Configure the data emitter parameters as explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Pinecone earlier. Or create one as explained in the topic - Pinecone Connection →

Use the Test Connection option to ensure that the connection with the Pinecone channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Index

When emitting data, you can choose an existing index or create a new one. If you pick an existing index, your data will be written to it. Otherwise, you can create a new index with a unique name, and your processed data will be written to it during the application run.

Index Info

Additional information about the selected Index is provided here.

For example: Name, Metric, Dimension, Pod Type, Status, Shards, Replicas, Pods, Metadata Config, and Source Collection.

CONFIGURE INDEX

Configure the below fields necessary to create Index.

Metric and Dimension(s)

The distance metric to be used for similarity search. You can use ’euclidean’, ‘cosine’, or ‘dotproduct’.

Additionally, specify the dimensions of the vectors to be inserted into the index. Please enter dimension size between 1 to 20000.

Pod Type

Pod Type determines the hardware configuration for Pinecone indexes; s1 is storage-optimized, p1 is performance-optimized, and p2 is optimized for query throughput.

Pods

The number of pods required for running your Pinecone service. Generally, more pods mean more storage capacity, lower latency, and higher throughput.

Replicas

The number of times the index should be duplicated. Replicas provide higher availability and throughput.

Pod Size

Pods come in four different sizes: x1, x2, x4, and x8.

Each size doubles your index storage and compute capacity. The default size is x1.

Index Metadata

When creating a new index, you can select the metadata fields you want to index.

Indexing metadata fields can help speed up searches while saving memory space.

If no field is selected, then all the metadata is indexed.

COLUMNS TO INGEST

Specify the configuration options for the columns to be ingested into Pinecone.

Values

Select the columns with vectors to be ingested.

Constraints: Select an array(floats, double) column or JSON Array string.

ID

The ID is a unique identifier for each record.

Select the column to ingest IDs or choose to autogenerate them in a serial order.

Supports: String or Integer or Long columns.

Metadata

The metadata field allows associating additional information with each vector in an index. It uses key-value pairs, where keys are strings, and values can be strings, numbers, booleans, or lists of strings.

Key

The key in metadata is a label or category that identifies a specific piece of information. It’s like a name tag that helps you find what you’re looking for.

Mapping Value

Select the input data that is to be associated with the Key in a key-value pair within Metadata. It can be a string, number, boolean, or a list of strings. Constraints: Only Boolean, Integer, Long, String, Array columns are supported.

WRITE CONFIGURATION

Configure how the to write the data in Pinecone.

Save Mode

Save mode specifies how to handle any existing data in the target.

The options are:

Upsert: Contents will be insert or updated depending on Id column values.
Overwrite: Existing data in the target Index will be overwritten with the current data.

Write Mode

Sync: Updates are applied immediately to maintain real-time consistency.

Async: Updates occur in the background, optimizing system performance.

Batch Size

Batch Size determines the number of rows to insert per request. Please enter batch size between 1 to 100.

Parallelism

The maximum number of simultaneous connections that can be established with Pinecone.

Output Mode

👉

This field appears with a streaming data source.

Output Mode specifies how to write the data.

The options are:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink
Complete: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates

Enable Trigger

👉

This field appears with a streaming data source.

Trigger defines how frequently a streaming query should be executed.

If trigger is enabled, provide the processing time.

Processing Time

The time interval or conditions set to determine when streaming data results are emitted or processed.

Add Configuration: Additional properties can be added using this option as key-value pairs.

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

If you have any feedback on Gathr documentation, please email us!

Pinecone ETL Target

Target Configuration #

Connection Name #

Index #

Index Info #

CONFIGURE INDEX #

Metric and Dimension(s) #

Pod Type #

Pods #

Replicas #

Pod Size #

Index Metadata #

COLUMNS TO INGEST #

Values #

ID #

Metadata #

Key #

Mapping Value #

WRITE CONFIGURATION #

Save Mode #

Write Mode #

Batch Size #

Parallelism #

Output Mode #

Enable Trigger #

Processing Time #

Notes #