Data Cleansing Processor
The Data Cleansing Processor is used to cleanse the dataset using the metadata.
Processor Configuration
Columns included while Extract Schema Column names that are mentioned in processor will be used in the data cleansing process.
Connection Type Select connection type from where the user wants to read the metadata files. The available connection types are RDS and S3.
Connection Name Select the connection name to fetch the metadata file.
S3 Connection Configuration
Bucket Name Provide the bucket name if the user selects S3 connection.
Path Provide the path or sub-directories of the bucket name mentioned above to which the data is to be written in case the user has opted for S3 connection.
RDS Connection Configuration
Schema Name Select the schema name from the drop-down list in case the RDS connection is selected.
Table Name Select the table name from the drop-down list in case the RDS connection is selected. Here, the Meta data should be in tabular form.
Additional Configurations
Feed ID Provide the name of feed ID to be filtered out from metadata.
Remove Duplicate User has an option to check-mark the checkbox to remove duplicate records.
Include Extra Input Columns User has an option to check-mark the checkbox to include extra input columns.
If you have any feedback on Gathr documentation, please email us!