Hashing Processor
Hash fields based on different Hashing Algorithms, to hide sensitive information.
Hashing is a technique in which an algorithm (also called a hash function) is applied to a portion of data to create a unique digital “fingerprint” that is a fixed-size variable.
If anyone changes the data by so much as one binary digit, the hash function will produce a different output (called the hash value) and the recipient will know that the data has been changed.
Hashing can ensure integrity and provide authentication as well.
The hash function cannot be “reverse-engineered”; that is, you can’t use the hash value to discover the original data that was hashed. Thus, hashing algorithms are referred to as one-way hashes.
A good hash function will not return the same result from two different inputs (called a collision); each result should be unique.
Processor Configuration
To configure the Hashing processor, read about the below options.
Output Field
The list of columns in which hashed value of selected column is stored.
New column name can be added to this field (This column will be added to dataset.)
Input FieldL
This is the list of columns for which you want to hash the field.
Hashing Type
This is the list of options for the type of Hashing.
The options are: MURMUR3_128, MURMUR3_32, MD5, SHA1, SHA256, SHA512, ADLER_32 AND CRC_32.
Each of these hashing functions serves a specific purpose in cryptography, data integrity verification, or data fingerprinting.
Here’s a brief explanation of each:
MURMUR3_128 and MURMUR3_32: These are non-cryptographic hash functions known for their speed and efficiency. They produce a fixed-size hash value (128 bits or 32 bits) from an input data stream, often used in applications like hash tables or data partitioning.
MD5 (Message Digest Algorithm 5): MD5 is a widely-used cryptographic hash function that produces a 128-bit (16-byte) hash value. It’s used for data integrity verification and is not recommended for security-sensitive applications due to vulnerabilities.
SHA-1 (Secure Hash Algorithm 1): SHA-1 is a cryptographic hash function that produces a 160-bit hash value. It was once widely used but is now considered weak and vulnerable to collision attacks. It’s not recommended for secure applications.
SHA-256 (Secure Hash Algorithm 256): SHA-256 is part of the SHA-2 family of cryptographic hash functions. It produces a 256-bit hash value and is considered secure for most applications, including digital signatures and certificate authorities.
SHA-512 (Secure Hash Algorithm 512): Similar to SHA-256, SHA-512 is a member of the SHA-2 family but produces a 512-bit hash value. It offers stronger security and is used in applications where higher security is required.
ADLER_32: Adler-32 is a checksum algorithm that produces a 32-bit checksum value. It’s primarily used for error-checking in data transmission and is relatively simple and fast.
CRC_32 (Cyclic Redundancy Check): CRC-32 is another checksum algorithm that produces a 32-bit checksum value. It’s commonly used in network communications and file storage for data integrity verification.
The choice of which hashing function to use depends on the specific requirements of your application.
Cryptographic hash functions like SHA-256 and SHA-512 are recommended for security-critical applications, while non-cryptographic hashes like MURMUR3 or Adler-32 are suitable for tasks where speed and efficiency are more important than cryptographic security.
Add Field
Add multiple columns for hashing, simultaneously with different hashing type.
If you have any feedback on Gathr documentation, please email us!