Data Quality Management Processor

DQM stands for Data Quality Management. DQM processor defines the quality rules for data and performs necessary action on the faulty data.

It performs a series of validation on individual data fields along with actions on functions, in case applied validation fails. Varieties of validations are available for each Data Type such as Not Null, Contains, and Starts With and Ends With.

DQM processor has seven categories of Validation functions, one for each Data Type.

Data Types

  1. ​Array
  1. ​String

    processor_string_image.png

  2. ​Number

    number_validation

  3. ​Decimal

    decimal_validation

  4. ​Date

    date_validation

  5. ​Boolean

    boolean-validation

  6. ​Timestamp

    timestamp_validation

  7. ​Opening Braces and Closing Braces

    Use these braces for adding or separating two or more expressions.

Action Functions

DQM processor has three Action functions that are triggered when Validation fails:

  1. ​Send To Error: It persists the invalid records in configured error handler target and discards the records for further processing in the pipeline. Right-click the Send To Error button to open the error handler configuration. When the data validation fails, a JSON message is sent to the configured error handler target. This JSON has the key errorMessage in the error handler target, which contains the expressions due to which validation fails.
  1. ​Assign Value: It gives the option to assign static or dynamic values to any message field.

    a. Static Values:

    Select any message field and apply validation on it. If the validation criteria are not met, assign static values to that field.

    Example:

    Applying validation on field id (id equals 10).

    Click on the Save button for saving the validation criteria.

    Whenever the value of the field id is equal to “10”, the record will be processed as it is.

    If the value of the field id is not equal to “10”, static value 100 will be assigned to that id.

    Click on the Save button for saving the configuration settings.

    Click Next and write the notes. The changes will be reflected in Auto Inspect window.

    Wherever the id value is not equal to “10”, the value will get replaced with the static value 100.

    b. Dynamic values:

    • Dynamic String Values

    • Dynamic Numeric values

    Dynamic String Values

    Dynamic values can be assigned using @ (the rate operator).

    Select any message field and apply validation on it. If the validation criteria is not met, assign dynamic values to that field.

    Example:

    Applying validation on field id (id equals 10).

    Wherever the value of the field id is equal to 10, the record will be processed as it is. If this validation criterion is not met, the dynamic value should be assigned to that id.

    You can dynamically assign the values of any other field like crop, soil, climate, pest, cost_per_hectare (message fields) to the field id wherever validation criteria are not met.

    Write @ (the rate operator) in the values text box of Configuration Settings-Assign Value.

    All the message fields will be displayed in the drop-down list. Select the field whose value is to be dynamically assigned to field id.

    Example: let us select field “soil” from the drop down list.

    Write the expression as: @{soil}

    The selected field value has to be enclosed within the curly braces.

    After writing the expression, click on the Save button.

    Wherever the value of the field id is 10, the record will be processed as it is.

    Otherwise, the value of the field soil will be assigned to the id value in that column.

    Click on the Save button for saving the configuration settings.

    Click Next and write the notes. The changes will be reflected in Auto Inspect window.

    Wherever the field id value is not 10, field soil will get assigned to the id value.

    Dynamic Numeric value: For numeric fields, mathematical operations can be used for assigning the values.

    Example:

    Let us apply validation on field “id” (id equals to 10).

    Wherever the value of the field id is equal to 10, the record will be processed as it is. If this validation criteria is not met, assign dynamic value to the field id using arithmetic operators.

    Let us apply simple mathematical operation on the field “cost_per_hect”.

    Write the expression as @{cost_per_hect+100}

    The expression will be evaluated as cost per hectare value will be added to 100. The output will be assigned to the field id wherever its value is not 10.

    Click on the Save button for saving the configuration settings.

    Click Next and write the notes. The changes will be reflected in Auto Inspect window.

    Let us evaluate the columns “id and cost_per_hect”.

    The value of the cost_per_hect in the second row is 7193, if 100 is added to it, the output is 7293 and this value gets assigned to the field id.

    The value of the cost_per_hect in the third row is 7403, if 100 is added to tithe output is 7503, this value gets assigned to the field id.

    Accordingly, all id values get updated except where the id is 10.

  2. ​Discard: It discards the record for further processing in the pipeline.

    Discard_Custom_Processor

    If a record is eligible for more than one action then Actions will be executed as per the below order:

    • ​Send To Error (Highest priority)

    • ​Discard (Lowest priority)

DQM Processor Configuration

To add a DQM processor to your pipeline, drag the DQM processor to the canvas and connect it to a Data Source or processor. Right-click on it to configure.

  1. On the configuration canvas of Data Quality processor, select fields, then drag and drop the Validation and Action functions.

  2. Choose a field in message panel from Configuration, respective data type validations will be expanded in function panel.

  3. After selecting the message, drag and drop the validation functions and connect them.

    Connecting lines show the Logical operator, AND/OR over it.

    By default, the AND operator is selected, but it can be toggled to OR operator via right click on it.

    Most validation functions in Data Quality processor does not require user input such as Not Null, Is Empty, Upper Case etc, hence no error icon gets visible on top of them. There are some validation functions which requires user input and shows the error (red) icon for configuration.

  4. Right click on the field (message) to configure the required input for validation function.

    Configuration panel also shows the Negate option so that same validation function can be utilized with reverse condition.

    Example: For a schema field with name start date if a function is selected having a similar field name, then the value in the respective function’s field will get filled according to the schema field.

    In such a scenario, for an expression to be successful, the schema field name must be changed to not match the function field name.

  5. Once all of the validations are configured, you can add an Action function at the end. This action function gets triggered if validation fails.

    You can select multiple fields and define the validations.

    “yyyy-MM-dd HH:mm:ss”

    For Date field, use below format:

    “yyyy-MM-dd”

Top