Azure Blob Emitter

On a Blob Emitter you should be able to write data to different formats (json, csv, orc, parquet, and more) of data to blob containers by specifying directory path.

Azure Blob Emitter Configuration

To add an Azure Blob emitter into your pipeline, drag the emitter to the canvas and connect it to a Data Source or processor.

The configuration settings are as follows:

FieldDescription
Connection NameAll connections will be listed here. Select a connection for connecting to Azure Blob.
ContainerAzure Blob Container Name.
PathSub-directories of the container mentioned above to which data is to be written.
Output TypeOutput format in which result will be processed.
DelimiterMessage Field separator.
Output FieldsSelect the fields that needs to be included in the output data.
Partioning RequiredIf checked, data will be partitioned.
Save ModeSave mode specifies how to handle the existing data.
Output Mode

Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Checkpoint Storage LocationSelect the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint ConnectionsSelect the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-Based Check PointSelect checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Enable TriggerTrigger defines how frequently a streaming query should be executed.
Processing TimeIt will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds.
Add Configuration

Enables to configure additional properties.

Example: Perform imputation by clicking the ADD CONFIGURATION button.

Example: nullValue =123, the output will replace all null values with 123

Click on the Next button. Enter the notes in the space provided.

Click on the DONE button for saving the configuration.

Top