Cosmos Emitter

On a Cosmos Emitter you should be able to emit data into different containers of selected Cosmos database.

In case of Streaming Cosmos Channel and Batch Cosmos channel, the cosmos emitter has different properties, explained below:

Cosmos Emitter Configuration

To add a Cosmos Emitter to your pipeline, drag the Emitter onto the canvas, connect it to a Data Source or processor, and click on it to configure.

👉

If the data source in pipeline is streaming Cosmos then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

Field	Description
Connection Name	Select the connection name from the available list of connections, from where you would like to read the data.
Override Credentials	Unchecked by default, check mark the checkbox to override credentials for user specific actions.
Key	Provide the Azure Cosmos DB key. Click TEST Connection to test the execution of the connection.
Database	Select the Cosmos Database from the drop-down list.
Container	Select the Cosmos Database from the drop-down list.
Write Strategy	Select the Cosmos DB Write Strategy from the drop down list. Item Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data using Upsert. Item Append: For persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. ItemDelete: Option to delete the data.
Output Fields	Fields in the message that needs to be a part of the output data.
Upsert	If set to True, the item with existing ids gets updated and if it does not exist, it gets created. 👉 In case if you choose to provide a write strategy, then Upsert option will not be available. Also, the Upsert option is available for Spark 2.4.
Connection Retries	Number of retries for component connection.
Delay Between Connection Retries	Defines the retry delay intervals for component connection in millis.
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based Check Point	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Output Mode	Output mode to be used while writing the data to Streaming sink. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink.
Save Mode	Save Mode is used to specify the expected behavior of saving data to a data sink. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
Writing Batch Size	Define the writing batch size for writing to cosmos.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Processing Time	It will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds.

ADD CONFIGURATION|Additional properties can be added.

If you have any feedback on Gathr documentation, please email us!

Cosmos Emitter

Cosmos Emitter Configuration #

Cosmos Emitter Configuration