Cosmos ETL Target

Azure Cosmos is a globally distributed, multi-model database service. It is designed to provide high availability, scalability, and low-latency access to data for modern applications.

Target Configuration

Configure the data emitter parameters as explained below.


Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Cosmos earlier. Or create one as explained in the topic - Cosmos Connection →

Use the Test Connection option to ensure that the connection with the Cosmos channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.


Database

Specify the name of the target database in Azure Cosmos.


Container

Define the container within the specified database to store the emitted data.


Upsert

Set to ‘True’ to enable upsert functionality, updating existing records if found.

Set to ‘False’ to perform insert only.


Connection Retries

Specify the number of times the emitter will attempt to establish a connection with Azure Cosmos in case of failures.


Delay Between Connection Retries

Define the time interval (in milliseconds) between consecutive connection retry attempts.


Save Mode

Choose the save mode for batch data processing:

  • Error if Exists: Choose it to get error on duplicate data.

  • Append: Choose it to add new data.

  • Overwrite: Choose it to replace existing data.

  • Ignore: Choose it to skip duplicates.


Output Mode

Specify the output mode for streaming data:

  • Append: Choose it to add new data.

  • Complete: Choose it for the entire result table update.

  • Update: Choose it for incremental updates.


Writing Batch Size

Define the size of batches when writing data to Azure Cosmos, optimizing for performance.


Enable Trigger

Activate streaming triggers to respond to changes in the data source in real-time.

Processing Time

Specify the processing time duration for streaming data, influencing the frequency of data processing intervals.

Choose between Minutes or Seconds to determine the time unit.


Add Configuration: Additional properties can be added using this option as key-value pairs.


Schema Mapping

In this schema, you can define the source and target column mappings.

Schema Mapping Target

The actions available for the schema mapping section are explained below:

  • Search: Search the Column Name values to get a specific target column.

  • Refresh Schema: Use this option to refresh the entire schema mapping section.

  • Auto Fill: Use this option to match the source and target column names, and automatically fill the source column mapping values to the corresponding target columns.

  • Auto Fill Sequentially: Use this option to sequentially fill the incoming source column mapping values to the corresponding target columns.

  • Download Mapping: Use this option to download a sample schema file. Update mapping values in the downloaded file.

    In case if Gathr application does not have access to a target table, you can choose the option download schema mapping option to map the target table columns with the source columns during the design time and confirm the data type for each column.

    In such cases you can run the application in a registered environment, that has access to all the required resources. During run-time, the application will run on the registered cluster of your choice picking up the configuration values as provided during application design.

  • Upload Mapping: Use this option to upload the sample schema file with updated mapping values to provide the schema mapping.

The fields visible in the schema mapping section are explained below:

  • Column Name: The column names as per the target entity selected will get populated in the Schema Mapping section.

  • Mapping Value: The source column should be mapped for each target column listed in the Schema Mapping section. This operation can be done individually or in bulk using the auto fill action.

  • Data Type: The data type of each target column is listed, for example, INT, TIMESTAMP, BIT, VARCHAR and so on.

  • Is Autogenerated: Specifies if any target column(s) are autogenerated, for example, ID column may have autogenerated values.

  • Ignore None/All/Unmapped: The target columns selected here will be ignored while emitting the data. There are bulk actions available to ignore none of the columns, all the columns or only the unmapped columns.


Post Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions →


Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top