Aggregation Processor
In this article
The aggregation processor is used for performing the operations like min, max, average, sum and count over the streaming data.
Example: An IOT device generates an event comprising three fields: id, time, and action.
To calculate how many actions of each type happened in a window of 15 seconds duration. The Aggregation (COUNT) operation can be used for calculating the number of actions of each type from total events received in a window of 15 seconds duration.
The following table lists the functions supported by Aggregation processor.
Functions | Description |
---|---|
Average | Calculates average of selected fields and also assigns it to provided output field. |
Sum | Sum function totals all the selected fields and also assigns it to provided output field. |
Count | Count values of selected input fields and also assigns it to provided output field. |
Minimum | Calculates the minimum value of selected fields and also assigns it to provided output field. |
Maximum | Calculates the maximum value of selected fields and also assigns it to provided output field. |
First | Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. |
Last | Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values. |
Median | |
Percentile | Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral. |
Standard Deviation | Returns the sample standard deviation calculated from values of a group. |
Variance | Returns the sample variance calculated from values of a group. |
Collect List | Collects and returns a list of non-unique elements. |
Collect Set | Collects and returns a set of unique elements. |
Aggregation Processor Configuration
To add an Aggregation processor to your pipeline, drag the processor to the canvas and right-click on it to configure.
Field | Description |
---|---|
Functions | All Aggregation functions are listed here. Select the function from the list which is to be applied over Input fields. |
Input Fields | Select the fields from the list on which Aggregation has to be applied. |
Output Field | Outcome of the Aggregation function is stored in this field. |
Time Window | Type of time window that you want to apply. Time window options: - Fixed Length: When selected, configured aggregation functions get applied over a fixed window duration. - Sliding Window: When selected, configured aggregation functions get applied over the moving window. |
Window Duration | Time window duration in seconds. |
Slide Duration | Window slide duration in seconds. |
Event Column | Time stamp column of incoming records. |
Watermarking | Watermark handles the data which arrives late. The data is considered to be late when it arrives to the system after the end of the window. The water mark mechanism keeps the window open for specified watermark duration in addition to initial defined window duration. - Yes: Enables watermarking. - No: Disables watermarking. |
Watermark duration | It keeps the window open for the specified watermark duration in addition to initial defined window duration. - Yes: when selected Yes, watermarking will be applied. - No: When selected No, watermarking will not be applied. |
Group By | Applies grouping based on the selected fields. - Yes: Enables selection of grouping fields. - No: Disables selection of grouping fields. This is the default option. |
Grouping Fields | Select fields on which grouping is to be applied. |
Click on the NEXT button. Enter the notes in the space provided.
Click SAVE for saving the configuration details.
If you have any feedback on Gathr documentation, please email us!