Bedrock Embeddings

The Bedrock Embeddings Processor facilitates access to foundation models from providers like Amazon and Cohere.

Models such as Titan Embeddings G1 and Cohere’s Embed English offer capabilities for text summarization, generation, classification, Q&A, information extraction, and embeddings.

Configure the processor parameters as explained below.


Connection Name

A connection name can be selected from the list if you have created and saved connection details for Bedrock earlier. Or create one as explained in the topic - Bedrock Connection →


Select Provider

Amazon Bedrock hosts a variety of foundation models, each with unique strengths and application domains. Pick a provider to access foundation models (FMs) from AI companies:


Amazon

Access text summarization, generation, classification, open-ended Q&A, information extraction, embeddings, and search capabilities.

Titan Embeddings G1 - Text v1.2

  • Text Translation: Convert text into numerical representations.

  • Supported Languages: Covers translation for 25 or more languages.

  • Semantic Similarity: Utilize embeddings for assessing semantic similarity between texts.

  • Clustering for Data Analysis: Employ embeddings for clustering and analysis of textual data.

  • Knowledge Management: Contribute to knowledge organization and management through numerical representations.

  • Max Capacity: Process up to 8,000 tokens, ensuring flexibility in handling diverse text lengths.


Titan Multimodal Embeddings G1 v1

  • Multimodal Search: Perform search operations utilizing text inputs.

  • Supported Language: Primarily optimized for English language processing.

  • Accurate and Contextually Relevant Search: Enhance search accuracy and relevance through multimodal capabilities.

  • Recommendation Experiences: Facilitate personalized recommendation experiences in e-commerce and digital media.

  • Max Token Capacity: Process up to 128 tokens in the text input.

  • Max Image Size: Handle images with a maximum size of 25 megabytes, enabling diverse image-based applications.


Image Column

Encode the image that you want to convert to embeddings in base64 and enter the string in this field.


Cohere

Benefit from foundational large language models tailored for enterprise applications.

Embed English

  • Semantic Search: Conduct advanced searches based on semantic understanding.

  • Supported Language: Optimized for English language processing.

  • Precise Text Retrieval: Retrieve precise and contextually relevant textual information.

  • Classification in Knowledge Management: Support classification tasks in knowledge management systems.

  • Information Systems: Contribute to effective information retrieval and management.

  • Dimensionality: Represents information in a high-dimensional space with 1024 dimensions, enhancing semantic richness and context.


Embed Multilingual

  • Global Reach: Designed for widespread applicability with support for over 100 languages.

  • Multilingual Applications: Ideal for applications requiring multilingual capabilities.

  • Semantic Search: Enables advanced and meaningful search functionalities.

  • Data Clustering: Facilitates clustering of data for insights and organization.

  • International Business: Supports applications in international business contexts.

  • Research: Suitable for research purposes, aiding in data analysis and exploration.

  • Dimensionality: Represents information in a high-dimensional space with 1024 dimensions, enhancing semantic richness and context.


Input Type

Specifies the type of input and guides the model on how to handle and differentiate various data types.

  • Search Document: Specifically designed for search use-cases. Utilize this type when encoding documents for embeddings that will be stored in a vector database. It’s tailored for scenarios where you need to embed and retrieve documents efficiently.

  • Search Query: Intended for search queries in conjunction with a vector database. When querying the vector database to find relevant documents, use the search_query type. This ensures that the model understands the nature of the input and can generate suitable embeddings for search queries.

  • Classification: Tailored for applications involving text classification. When using embeddings as input for a text classifier, designate the input_type as classification. This provides the necessary context for the model to generate embeddings optimized for classification tasks.

  • Clustering: Specifically designed for clustering tasks. When clustering embeddings, use the clustering type as input. It informs the model about the purpose of the input data, facilitating effective clustering of embeddings based on their semantic similarities.


Batch Size

Batch Size determines the number of prompts to be sent in a single request.

Streaming data sources do not support the Batch Size parameter.


Input column

Select a column to convert text into numerical vectors.


Output Column

Select a column to assign embeddings or create a new output column. Type a name for the new column and press enter to create it.


RATE CONTROL

Choose how to utilize the AI model’s services:

Make Concurrent Requests with Token Limit: You can specify the number of simultaneous requests to be made to the AI model, and each request can use up to the number of tokens you provide.

This option is suitable for scenarios where you need larger text input for fewer simultaneous requests.

OR

Rate-Limited Requests: Alternatively, you can make a total of specified number of requests within a 60-second window.

This option is useful when you require a high volume of requests within a specified time frame, each potentially processing smaller amounts of text.

Top