Schema and Rules

The Schema and Rules tab provides users with a comprehensive view of the data quality, completeness, uniqueness, and rule management for the columns within a data asset.

Actions Available

There are various actions that can be performed on each tab of the view data asset, in addition to the listing page.

common_headers

Edit Data Asset Name: Modify the name of the data asset to better suit your needs.

Additional Options: Access a range of actions including deletion, utilization in Ingestion or ETL Applications, marking as a favorite, and configuring the data asset.

Start Profiling: Initiate data profiling to gain insights into your data’s characteristics and quality.

Back to Data Assets Listing: Return to the list of all data assets for an overview of your data

common_options


Overview

This tab empowers you to gain deeper insights into the attributes of your data and maintain its integrity through rule application and version control.

Schema&Rules

Columns of Data

Every column has these sections and you can perform required operations on the column.

Columns_Of_Data

Data Type

Supported data types are:

  • Date and Timestamp

  • Numeric

  • String

  • Boolean

For example, in the image shown above, the supported data type is denoted by AZ for String data type.


Sort

The columns have sorting capabilities to ease the data exploration.

This feature enables you to arrange data alphabetically, i.e., from A to Z or Z to A.


Operations

This option is at the top right of the column.

You can open a slide window with the gear icon and perform the following operations on the respective column of the schema.

  • Filter: Utilize the available filters either by applying preset options like “Equals” or by specifying a range using the “Range” filter, among others. For unique scenarios, you can use the custom filter tailored to specific requirements.

  • Transform: Apply the transformation filter, which include options like “Replace” for substituting values and “Format” for adjusting the appearance of data.

  • Missing Value Replacement: Replace the missing or null values with either Literal or Expression value.

    • Literal: Replaces Null and/or empty string values with a specified string literal.

    • Expression: Replaces Null and/or empty string values with a specified expression value.

  • Pivot: Pivot the columns, where PIVOT is a relational operator that converts data from row level to column level.

  • Group By To group the columns together, this filter can be used.

  • Rename Column Rename the column name.

  • Create New Column Create a new column using this filter.

  • Remove Column Remove the selected column.


Actions on Columns

The actions that can be performed on all the columns together are as follows:

  • Create Column: Create a column using this icon.

  • Keep/Remove Columns: Keep or remove a column in the schema.

  • Display Columns: Display the selected columns in the schema.

  • Search Value: Search for a value in the schema.


PII Masking

Gathr’s PII Masking Feature significantly boosts data security by automatically recognizing personally identifiable information (PII) in your data assets.

Auto Detect PII

Gathr intelligently identifies various PII types, including email addresses, credit card numbers, phone numbers, IP addresses, and similar types.

PII-Masking-01

The feature automatically identifies and marks PII in your data with a PII Shield icon for easy recognition and enhanced security.

The shield indicates that PII Masking can be applied to the marked data fields.


PII Masking Steps

To implement PII masking for your data assets, follow these steps:

  1. Navigate to Schema and Rules: After creating your data asset, navigate to the schema & rules tab to configure PII settings.

    PII-Masking-02

  2. Enable PII for Columns: From the settings icon, enable PII for the desired columns and insert the required masking input values, tailoring the masking to your specific needs.

    PII-Masking-03

  3. PII Masked Columns: After masking the required fields, they can be identified as visibly masked and the PII Shield icon turns to red.

    PII-Masking-04

    Applying PII to any column in your data asset triggers the addition of a rule.

  4. Profile Run on Masked Data: To ensure proper masking, it is recommended to do a profile run after enabling PII on the selected fields.


PII Feature Notes

Gathr’s PII Detection feature ensures a robust identification process and customizable configuration:

  • Users can turn PII detection on or off based on their specific requirements.

    Use in Ingestion and ETL with PII-Enabled Data Assets:

  • Masked data is visible based on the specified masking type during inspection or schema detection.

  • During the Ingestion or ETL process, the masked data is emitted to the target, ensuring secure data transmission.

For a seamless and secure data processing experience, Gathr’s PII Masking Feature protects sensitive information while preserving data integrity and usability.

Explore and configure PII masking settings in your data assets to align with your specific data security requirements.


Column-Level Data Quality

Upon selecting a specific column from the displayed schema, you can see a detailed data quality assessment for that individual column.

data_asset_rules_1

This assessment includes:

  • Completeness: The extent to which the column values are populated. This metric indicates the ratio of non-null values to the total number of records, offering insights into data availability.

  • Uniqueness: A measure of the distinctness of values within the selected column. This metric provides an understanding of the level of redundancy or variation in the data.


Unique Values

In the Unique Values section, you will find a list of distinct values within the column.

data_asset_rules_unique_values_2

Move your cursor over any bar to see the corresponding count of that unique value.

Furthermore, you can arrange these unique values in descending or ascending order, as well as alphabetically.


Rules Applied

Rules represent conditions or actions that are implemented on columns to modify the data asset according to specific requirements.

You can review the applied rules in the left navigation panel.

SaveRules

Save Rules

After applying rules, simply click on the SAVE RULES option to preserve these modifications.

SaveRules_1

The first time a data asset is created, the version of the data asset is version 0.

After applying the changed rules, a new version of the data asset will be created.

As per the rules applied on the Columns, the Analyze window and Unique Values window also change their values.

Whenever you make modifications to a saved data asset, the version is incremented by n+1.

Delete Rules

You can delete individual rules by clicking the Delete icon next to each rule, or you can delete all rules at once using the option at the top of the rules section.

Reset Rules

You can use the RESET option, positioned next to the “Save Rules” function, to revert the changes made on a previously saved data asset. This option becomes visible as soon as any alterations are applied to the data asset.


Statistics

The Statistics section provides a comprehensive overview of the mathematical statistics.

Schema_and_Rules_Stats

The statistical values for a relevant column, is presented in the following format:

  • Minimum
  • Maximum
  • Mean
  • Median
  • Standard Deviation
  • Mode
  • Distinct
  • Sum
  • Range
Top