Aggregation Processor

The aggregation processor is used for performing the operations like min, max, average, sum and count over the streaming data.

Example: An IOT device generates an event comprising three fields: id, time, and action.

To calculate how many actions of each type happened in a window of 15 seconds duration. The Aggregation (COUNT) operation can be used for calculating the number of actions of each type from total events received in a window of 15 seconds duration.

The following table lists the functions supported by Aggregation processor.

FunctionsDescription
AverageCalculates average of selected fields and also assigns it to provided output field.
SumSum function totals all the selected fields and also assigns it to provided output field.
CountCount values of selected input fields and also assigns it to provided output field.
MinimumCalculates the minimum value of selected fields and also assigns it to provided output field.
MaximumCalculates the maximum value of selected fields and also assigns it to provided output field.
FirstReturns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.
LastReturns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.
Median
PercentileReturns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral.
Standard DeviationReturns the sample standard deviation calculated from values of a group.
VarianceReturns the sample variance calculated from values of a group.
Collect ListCollects and returns a list of non-unique elements.
Collect SetCollects and returns a set of unique elements.

Aggregation Processor Configuration

To add an Aggregation processor to your pipeline, drag the processor to the canvas and right-click on it to configure.

FieldDescription
FunctionsAll Aggregation functions are listed here. Select the function from the list which is to be applied over Input fields.
Input FieldsSelect the fields from the list on which Aggregation has to be applied.
Output FieldOutcome of the Aggregation function is stored in this field.
Time Window

Type of time window that you want to apply.

Time window options:

- Fixed Length: When selected, configured aggregation functions get applied over a fixed window duration. 

- Sliding Window: When selected, configured aggregation functions get applied over the moving window.

Window DurationTime window duration in seconds.
Slide DurationWindow slide duration in seconds.
Event ColumnTime stamp column of incoming records.
Watermarking

Watermark handles the data which arrives late. The data is considered to be late when it arrives to the system after the end of the window.

The water mark mechanism keeps the window open for specified watermark duration in addition to initial defined window duration.

- Yes: Enables watermarking.

- No: Disables watermarking.

Watermark duration

It keeps the window open for the specified watermark duration in addition to initial defined window duration.

- Yes: when selected Yes, watermarking will be applied.

- No: When selected No, watermarking will not be applied.

Group By

Applies grouping based on the selected fields.

- Yes: Enables selection of grouping fields.

- No: Disables selection of grouping fields. This is the default option.

Grouping FieldsSelect fields on which grouping is to be applied.

Click on the NEXT button. Enter the notes in the space provided.

Click SAVE for saving the configuration details.

Top