Hashing Processor
In this article
Hash fields based on different Hashing Algorithms, to hide sensitive information.
Hashing is a technique in which an algorithm (also called a hash function) is applied to a portion of data to create a unique digital “fingerprint” that is a fixed-size variable. If anyone changes the data by so much as one binary digit, the hash function will produce a different output (called the hash value) and the recipient will know that the data has been changed. Hashing can ensure integrity and provide authentication as well.
The hash function cannot be “reverse-engineered”; that is, you can’t use the hash value to discover the original data that was hashed. Thus, hashing algorithms are referred to as one-way hashes. A good hash function will not return the same result from two different inputs (called a collision); each result should be unique.
Hashing Processor Configuration
To add Hashing processor into your pipeline, drag the processor to the canvas and right click on it to configure as explained below:
Field | Description |
---|---|
Output Field | The list of columns in which hashed value of selected column is stored. New column name can be added to this field (This column will be added to dataset.) |
Input Field | This is the list of columns for which you want to hash the field. |
Hashing Type | This is the list of options for the type of Hashing. The options are: MURMUR3_128, MURMUR3_32, MD5, SHA1, SHA256, SHA512, ADLER_32 AND CRC_32. |
Add Field | Add multiple columns for hashing, simultaneously with different hashing type. |
After configuring all the fields, click Next; the schema will be detected and then you can verify and save the configuration.
If you have any feedback on Gathr documentation, please email us!