Generative AI in Gathr

Generative AI works by using advanced machine learning models, particularly neural networks, to generate new content based on patterns and data it has been trained on.

Gathr’s GenAI offerings can increase productivity, create unique experiences, innovate, and transform your business.

  • Generative AI-powered processors and industry-leading models in Gathr to build modern AI solutions for your data and use cases.

  • Pre-packaged AI models for embeddings generation, Q & A, and summary generation.

  • MLflow registered models to make inferences in Gathr pipelines.

  • Vector DBs support - Redis, Milvus, and Pinecone vector databases for embedding-based operations.


GenAI Scope in Gathr

GenAI in the Gathr is a powerful tool for data processing and code generation, but it has certain boundaries that are important for users to understand. It operates only within its domain and avoids topics unrelated to the supported components.

  • No Access to External Data: The AI does not have access to external sources of real-time or historical data (for example, sports statistics, financial data, etc.).

  • No General Knowledge Queries: Queries related to general knowledge, such as geographical or historical facts, will not be answered. The assistant is designed specifically for data processing tasks within the platform.

    Example: For Gathr’s SQL processor that offers GenAI support for writing SQL expressions using natural language, a user query which is not related to the provided dataset is responded as given below.

    User Query:What is Indian cricket team’s highest score?

    AI Response

    AI response to unrelated query

Similarly, other components offering GenAI assistance provide similar responses to unrelated queriers.


AI enabled features and components in Gathr

Leverage AI capabilities in Gathr to efficiently configure operators and design applications. Complete your tasks by providing natural language inputs to Gathr’s AI assistant.

Transform Data using Gathr IQ in ETL

Gathr IQ is your AI assistant that helps you automatically design ETL applications using natural language inputs.

Add a data source on the ETL canvas, configure it and then simply provide instructions to the AI Assistant for automatically generating the data transformation tasks.

gen-ai-assistant

To read more about this feature, please click here →


Generate Metadata in Data Assets

Automatically generate descriptions for data assets and their columns. Different versions of a data asset can have distinct descriptions.

generate-metadata-data-asset

To read more about this feature, please click here →


Data Parsing and Processing

Robust data parsing and processing capabilities with Binary-to-Text Parser and PDF Parser Processors.

  • The Binary-to-Text Parser converts binary data into readable text, making it easier to analyze and process.

  • The PDF Parser converts binary PDF data into structured text. Split PDF documents by pages, sections, lines, words, or characters and extract specific information for analysis or processing. You can also parse and extract images from the PDFs using PDF Parser to later perform visual analysis and image recognition with help of OpenAI Image Processor.

These processors enhance data accessibility and streamline processing tasks, making Gathr a powerful tool for handling diverse data formats.


Gen AI Models

Gathr’s GenAI Models feature powerful tools like the OpenAI Processor and OpenAI Embeddings.

  • The OpenAI Text processor leverages advanced AI models to perform a variety of natural language processing tasks, enhancing data analysis and automation.

  • The OpenAI Embeddings processor converts text into numerical vectors, enabling efficient semantic search, clustering, and classification.

  • The OpenAI Image Processor to analyze images to extract insights and get answers about their content. Identify objects, detect colors, and obtain information about the visual elements present in the images. Unlock valuable insights from images by getting answers to a wide range of image-related questions.

  • Bedrock Text provides access to a range of foundation models hosted by leading AI companies such as AI21 Labs, Amazon, Anthropic, Cohere, and Meta. Explore the foundation models along with detailed insights into their parameters.

  • The Bedrock Embeddings provides access to foundation models of leading companies like Amazon and Cohere. With models such as Titan Embeddings G1 and Cohere’s Embed English, perform tasks ranging from text summarization and generation to classification, Q&A, information extraction, and embeddings.

  • The Azure OpenAI Processor to execute a diverse range of data-related tasks such as classification, data extraction, summarization, and sentiment analysis. You can accomplish these tasks without complex coding by simply providing instructions in natural language.

  • Access Google’s Gemini models, gemini-1.5-flash-001 and gemini-1.5-pro-001. Perform a variety of tasks through natural language instructions. Extract text from images, audio, or video, convert image text to JSON, analyze and summarize speech files, transcribe audio, and even handle multiple types of input media simultaneously using Vertex AI processor. For more details and configuration steps, please refer to Vertex AI Text Processor →


Gen AI Powered Processors

Gathr’s GenAI-powered processors include the Expression Evaluator, Expression Filter, Python, Scala, and SQL Processors, for data processing and analysis.

  • The Expression Evaluator performs various transformation operations on datasets using SparkSQL functions, such as formatting, trimming, and case conversion.

  • The Expression Filter allows for filtering datasets based on criteria like equality, ranges, and pattern matching.

  • The Python Processor enables custom Python code execution for data transformations and processing.

  • The Scala Processor supports custom Scala code for advanced data manipulation.

  • The SQL Processor allows for SQL-based data querying and transformation, leveraging the power of SQL within Gathr’s environment.

Generate code for these AI-powered processors using natural language instructions, making complex data tasks more accessible and intuitive.


MLflow in Gathr

MLflow is an open-source platform designed to manage the entire machine learning lifecycle.

Integrating MLflow with Gathr allows you to connect your MLflow instance to Gathr’s interface. By setting up a connection to the MLflow Tracking Server, you can access and utilize your MLflow registered models directly within Gathr. Read more about MLflow integration within Gathr, here.


Vector Lookup and Databases

Gathr supports advanced vector lookup and database operations with its integration of Redis, Milvus, and Pinecone vector databases.

  • Redis supports vector similarity search, making it ideal for real-time applications. For more details and configuration steps, please refer to Redis Lookup Processor →

  • Milvus excels in handling large-scale vector data, providing robust and scalable solutions for AI and machine learning applications. For more details and configuration steps, please refer to Milvus Lookup Processor →

  • Accelerate similarity search tasks with Pinecone Lookup Processor. This processor leverages Pinecone’s vector database for efficient similarity lookups, enhancing the performance of your search operations. For more details and configuration steps, please refer to Pinecone Lookup Processor →

Top