PII Masking Processor

PII or Personally Identifiable Information can be defined as any real details which could be used to identify or impersonate with a particular person. Common examples of PII are dates of birth, IP addresses, credit card details, passport numbers or full names. gathr users can leverage the PII auto detection and Masking functionality to mask sensitive data.

In gathr, the batch and streaming incoming data can be identified based on regular expressions defined in the PII xml file which can be further masked.

In gathr, you can enable the PII Masking functionality under the Schema Type tab in any data source.

PII_1

Also, a PII Masking processor will be automatically added in the pipeline flow on the canvas. The processor will have details of the columns selected for PII Masking from the incoming data of the source file.

PII_2

Notes:

  • The columns that have been detected as PII Masked will appear highlighted in blue color under the detected current schema tab.

  • Option to enable/disable PII Masking is available under (each PII Masking column) gear icon.

    PII_3

  • Supported file formats for PII Masking are CSV, Parquet, JSON, Avro, and XML.

To configure PII Masking Processor select the processor and join it with the pipeline.

The configuration to PII Masking Processor is explained in detail below.

Here, in the pipeline for example, the option for PII Masking has been enabled using Expression Evaluator processor.

PII_4

Now, as shown in the below image, the columns can be enabled for PII masking current schema can be enabled for PII Masking.

Once the option ‘Enable PII Masking’ is selected by clicking the gear icon, a PII Masking processor will get auto-connected to the pipeline as shown below:

PII_05

PII Masking Processor Configuration

PII_6

Under the Select Output field, the columns that have been enabled for PII Masking in the schema will be available in the drop-down list.

Select Output Field and provide character under the Add Masking Character column to mask the details of the schema. The Mask Type options are mentioned below:

FieldDescription
AllSelects all the characters for masking.
Alternate CharacterSelects alternative character for masking.
Head Characters

Select characters from the beginning of the data in the selected column for masking.

User needs to provide the number of characters that needs to be masked from the beginning.

Trailing CharactersSelect characters from the end of the string (right most part of the string) of the data in the selected column for masking. User needs to provide the number of characters that needs to be masked from end of the string.
Top