OpenAI Embeddings

The OpenAI Embeddings Processor in Gathr transforms text into numerical vectors. By converting words and phrases into multi-dimensional vectors, this processor captures semantic relationships, making it easier for your models to understand context and meaning in natural language.

It measures how related text strings are, making it perfect for various tasks, such as:

Search: Find relevant results by ranking them based on their connection to a query string.
Clustering: Group similar text strings together for organized analysis.
Recommendations: Discover items with related text, tailored just for you.
Anomaly Detection: Identify outliers with little relatedness to the rest of the data.
Diversity Measurement: Understand the distribution of similarities within your data.
Classification: Classify text strings based on their most similar label.

Processor Configuration

Configure the processor parameters as explained below.

Connection Name

A connection name can be selected from the list if you have created and saved connection details for OpenAI earlier. Or create one as explained in the topic - OpenAI Connection →

Embedding Model

Gathr supports below models:

text-embedding-3-large
text-embedding-3-small
text-embedding-ada-002

It helps to convert text into numerical values, which is useful for various tasks such as searching, grouping similar text, making recommendations, detecting anomalies, measuring diversity, classifying text based on its level of similarity, and it can be adjusted to output embeddings of different dimensions, allowing for customization based on specific task requirements.

Input Column

Select a column to convert text into numerical vectors.

Batch Size

Batch Size determines the number of rows to embed in single request. The maximum value allowed is 1000.

👉

Streaming data sources do not support the Batch Size parameter.

Output Column

Select a column to assign embeddings or create a new output column. Type a name for the new column and press enter to create it.

RETRY CONFIGURATION

Enable retries for embedding requests. Customize the number of retries and the pause duration between attempts for improved handling of temporary issues.

Enable Retry

Enable the option to retry the embedding request if it is timed out or exceeds the rate limit.

Retry Count

Specify the number of retries for resending an embedding request after a timeout or exceeding the rate limit. Increasing retries improves resilience in handling temporary issues.

Retry Delay

Specify the waiting time, in seconds, between each retry attempt for an embedding request. It represents the duration the system should pause before making another attempt.

Add Configuration: Additional properties can be added using this option as key-value pairs.

If you have any feedback on Gathr documentation, please email us!

OpenAI Embeddings

Processor Configuration #

Connection Name #

Embedding Model #

Input Column #

Batch Size #

Output Column #

RETRY CONFIGURATION #

Enable Retry #

Retry Count #

Retry Delay #