PII Masking Processor
In this article
PII or Personally Identifiable Information can be defined as any real details which could be used to identify or impersonate with a particular person. Common examples of PII are dates of birth, IP addresses, credit card details, passport numbers or full names. gathr users can leverage the PII auto detection and Masking functionality to mask sensitive data.
In gathr, the batch and streaming incoming data can be identified based on regular expressions defined in the PII xml file which can be further masked.
In gathr, you can enable the PII Masking functionality under the Schema Type tab in any data source.
Also, a PII Masking processor will be automatically added in the pipeline flow on the canvas. The processor will have details of the columns selected for PII Masking from the incoming data of the source file.
Notes:
The columns that have been detected as PII Masked will appear highlighted in blue color under the detected current schema tab.
Option to enable/disable PII Masking is available under (each PII Masking column) gear icon.
Supported file formats for PII Masking are CSV, Parquet, JSON, Avro, and XML.
To configure PII Masking Processor select the processor and join it with the pipeline.
The configuration to PII Masking Processor is explained in detail below.
Here, in the pipeline for example, the option for PII Masking has been enabled using Expression Evaluator processor.
Now, as shown in the below image, the columns can be enabled for PII masking current schema can be enabled for PII Masking.
Once the option ‘Enable PII Masking’ is selected by clicking the gear icon, a PII Masking processor will get auto-connected to the pipeline as shown below:
PII Masking Processor Configuration
Under the Select Output field, the columns that have been enabled for PII Masking in the schema will be available in the drop-down list.
Select Output Field and provide character under the Add Masking Character column to mask the details of the schema. The Mask Type options are mentioned below:
Field | Description |
---|---|
All | Selects all the characters for masking. |
Alternate Character | Selects alternative character for masking. |
Head Characters | Select characters from the beginning of the data in the selected column for masking. User needs to provide the number of characters that needs to be masked from the beginning. |
Trailing Characters | Select characters from the end of the string (right most part of the string) of the data in the selected column for masking. User needs to provide the number of characters that needs to be masked from end of the string. |
If you have any feedback on Gathr documentation, please email us!