OpenAI Image Processor

OpenAI Image Processor can be used for extracting insights from images.

It can answer questions about images, whether it’s identifying objects or colors. While it excels at general inquiries, detailed questions about object locations may not yield accurate results.

You can provide images as links or base64 encoded data. Plus, with support for multiple image inputs, you can combine information from various sources to enhance your queries.

Processor Configuration

Configure the processor parameters as explained below.

Connection Name

A connection name can be selected from the list if you have created and saved connection details for OpenAI earlier. Or create one as explained in the topic - OpenAI Connection →


Prompt

A prompt is a concise instruction or query in natural language provided to the OpenAI Image Processor to guide its actions or responses.

In the Prompts section, you have the flexibility to:

  • Choose Predefined Sample Prompts: Discover a set of ready-to-use sample prompts that can kickstart your interactions with the OpenAI Image Processor.

    openai_image_prompt_01

  • Configuration Options: Customize prompts to suit your specific needs.

  • Save Prompts: Store your preferred prompts for future use.

  • Delete Prompts: Remove prompts that are no longer necessary.

  • Prompt Reset: To reset the prompt, clear the details in the prompt field, restoring it to its default state.

    openai_image_save_prompt_01.png

System

Provide high-level instructions and context for the AI model, guiding its behavior and setting the overall tone or role it should play in generating responses.

Note: <|endoftext|> is a document separator that the model sees during training, so if a prompt is not specified, the model will generate a response from the beginning of a new document.

The placeholder {some_key} represents a variable that can be replaced with specific column data. You can map this key to a column in the next section using “EXTRACT INPUTS FROM PROMPT”.

User

The user prompt is a specific instruction or question provided by the user to the AI model, directing it to perform a particular task or provide a response based on the user’s request.

Note: <|endoftext|> is a document separator that the model sees during training, so if a prompt is not specified, the model will generate a response from the beginning of a new document.

The placeholder {some_key} represents a variable that can be replaced with specific column data. You can map this key to a column in the next section using “EXTRACT INPUTS FROM PROMPT”.


Input

The placeholders {__} provided in the prompt can be mapped to columns to replace its value with the placeholder keys.

openai_input_from_prompt_01

Input from prompt

All the placeholders {__} provided in the fields above are extracted here to map them with the column.

Input column

Select the column name to replace its value with the placeholder keys.


Input Image Columns

You can add rows as needed to specify the column(s) containing images to process along with additional configurations.

Input column

Select the name of the column containing the input images.

Type

The format of the input images, with options for Base64-encoded images or URLs pointing to image files. For smoother interactions, use image URLs instead of base64.

Analysis Detail Level

Control how the model processes the image and generates its textual understanding:

  • Low: This option provides quicker responses but with less detail. The model receives a smaller, 512px x 512px version of the image and uses a budget of 65 tokens.

  • High: Choose this for more detailed processing. The model first receives a low-res image, then creates detailed crops of the input image as 512px squares. It uses a higher token budget of 129 tokens for each detailed crop.

Drop Column (checkbox)

Option to remove the original input column after processing. This feature helps streamline data inspection and enhances performance by reducing unnecessary columns.


Output

The output can be configured to emit data received from input. This configuration includes utilizing Prompts, which can be interpreted as input columns via placeholders, and allows for emitting output either by specifying a column name or parsing it as JSON.

Process Response

Please provide the response format. Assign to Column/Parse as JSON

Json Key in Response

Add the JSON keys instructed to the model and map them with the corresponding output column names.

Output Column as JSON

Please type the column name for the data corresponding to the JSON keys instructed to the model.

Output Column as TEXT

Please type the column name for the data to be emitted.


Open AI Parameters

The parameters described below are configuration settings that govern the behavior and performance of OpenAI models, influencing how they respond to prompts and generate outputs.

Choose a model

Select an ID of the model to determine the AI’s capabilities and language style.

Gathr supports below models:

  • gpt-4-vision-preview

  • gpt-4-1106-vision-preview

  • gpt-4o

Max Token

The maximum number of tokens to generate in the chat completion.

The total length of input tokens and generated tokens is limited by the model’s context length.

Please note that it also affects the total tokens consumed per minute, potentially limiting the number of requests you can make. For example, if you set Max Tokens per request to 1500 and the model supports 90,000 tokens per minute, you can only make approximately 40 requests per minute.

Temperature

The sampling temperature to be used between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

It is generally recommended to alter this or top_p but not both.

Advanced Configuration

Stop Sequence

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Top P

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So, 0.1 means only the tokens comprising the top 10% probability mass are considered.

It is generally recommended to alter this or temperature, but not both.

Frequency Penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Presence Penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.


RATE CONTROL

Choose how to utilize the OpenAI’s services:

Make Concurrent Requests with Token Limit: You can specify the number of simultaneous requests to be made to OpenAI, and each request can use up to the number of tokens you provide.

This option is suitable for scenarios where you need larger text input for fewer simultaneous requests.

OR

Rate-Limited Requests: Alternatively, you can make a total of specified number of requests within a 60-second window.

This option is useful when you require a high volume of requests within a specified time frame, each potentially processing smaller amounts of text.


Enable retry requests

Enabling retry requests allows for automatic resubmission of failed or incomplete requests, enhancing data processing reliability.

No. of Retries

The number of times a request will be automatically retried if the response from OpenAI is not in JSON format or does not have all the entities.

When

Please select the criteria for retry. Select “Any output key is missing” if all keys are mandatory. Else, select the mandatory keys.

Include previous response

Please mark if a previous incorrect response should be added to the retry request messages prompt. Else, leave it unchecked.

Additional User Prompt

Please type prompt text to be considered while retrying the request.


Limitations

Some known limitations of the OpenAI Image Processor are:

  • Medical images: Not suitable for medical imaging interpretation or advice.

  • Non-English text: Performance may vary with non-Latin alphabets.

  • Small text: Enlarge text for better readability, avoid cropping.

  • Rotation: Misinterpretations possible with rotated or upside-down images/text.

  • Visual elements: Difficulty with varied styles like solid, dashed lines.

  • Spatial reasoning: Challenges with precise localization tasks.

  • Accuracy: Potential for incorrect descriptions or captions.

  • Image shape: Difficulty with panoramic or fisheye images.

  • Metadata and resizing: Original file details aren’t processed, images are resized.

  • Counting: May give approximate counts for objects.

  • CAPTCHAS: Submission of CAPTCHAs is blocked for safety.

Top