Aggregation Processor
The aggregation operator performs operations like min, max, average, sum, and count.
Example: For an IOT device that generates an event comprising three fields: id, time, and action.
You can calculate how many actions for each event happened in a window of 15 seconds. You can use the Aggregation (COUNT) operation to calculate the number of actions of each type from total events received in a window of 15 seconds.
Processor Configuration
To add an Aggregation operator to your application, select it and add it to the canvas. Connect it to a Data Source or another operator and right-click on it to configure.
Transform Fields
Required aggregation functions can be applied to the selected input fields. The transformation results will be stored in the specified output fields.
Functions
All Aggregation functions are listed here. Select the function from the list to be applied over the Input fields.
The functions supported by the Aggregation operator are listed below.
Average: Calculates the average of selected fields and assigns it to the provided output field.
Minimum: Calculates the minimum value of selected fields and assigns it to the provided output field.
Maximum: Calculates the maximum value of selected fields and assigns it to the provided output field.
Sum: Sum function totals all the selected fields and assigns it to the provided output field.
Count: Count values of selected input fields and also assigns it to the provided output field.
First: First function operates on a set of values from rows that rank as first in a given sort specification.
Last: Last function operates on values from rows that rank as last in a given sort specification.
Median: Median function calculates the set of values from which the median is determined.
Percentile: Percentiles function can calculate the point at which a certain percentage of observed values occur.
Standard Deviation: This function estimates the standard deviation within a sample data set.
Variance: It calculates the variance (division by n) of a set of numbers.
Collect as List: This function aggregates the values into an ArrayType, typically after group by and window partition. It collects all values, including duplicates.
Collect as Set: This function aggregates the values into an ArrayType, typically after group by and window partition. It de-dupes the data and returns unique values.
Input Fields
Select the fields from the list on which Aggregation must be applied.
Output Field
The outcome of the Aggregation function is stored in this field.
Add Field
To apply aggregation functions on more than one field, additional rows can be added using this option.
Time Window
Type of time window you want to apply.
None
When selected, no time window will be applied.
Fixed Length
When selected, configured aggregation functions get applied over a fixed window duration.
Sliding Window
When selected, configured aggregation functions get applied over the moving window.
Window Duration
Time window duration in seconds.
Slide Duration
Window slide duration in seconds.
Event Column
Timestamp column of incoming records.
Watermarking
Watermark handles the data which arrives late.
The data is considered late when it comes to the system after the end of the window.
The watermark mechanism keeps the window open for a specified watermark duration in addition to the initially defined window duration.
- Yes: When selected Yes, watermarking will be applied.
- No: When selected No, watermarking will not be applied.
Watermark Duration
It keeps the window open for the specified watermark duration and the initially defined window duration.
Group By
Applies grouping based on the selected fields.
- Yes: Enables selection of grouping fields.
- No: Disables selection of grouping fields. This is the default option.
Grouping Fields
Select fields on which grouping is to be applied.
If you have any feedback on Gathr documentation, please email us!