Data Cleansing Processor

The Data Cleansing Processor is used to cleanse the dataset using the metadata.

Processor Configuration

Columns included while Extract Schema Column names that are mentioned in processor will be used in the data cleansing process.

Connection Type Select connection type from where the user wants to read the metadata files. The available connection types are RDS and S3.

Connection Name Select the connection name to fetch the metadata file.

S3 Connection Configuration

Bucket Name Provide the bucket name if the user selects S3 connection.

Path Provide the path or sub-directories of the bucket name mentioned above to which the data is to be written in case the user has opted for S3 connection.

RDS Connection Configuration

Schema Name Select the schema name from the drop-down list in case the RDS connection is selected.

Table Name Select the table name from the drop-down list in case the RDS connection is selected. Here, the Meta data should be in tabular form.

Additional Configurations

Feed ID Provide the name of feed ID to be filtered out from metadata.

Remove Duplicate User has an option to check-mark the checkbox to remove duplicate records.

Include Extra Input Columns User has an option to check-mark the checkbox to include extra input columns.

👉

User can add further configurations by clicking the ADD CONFIGURATION button.

If you have any feedback on Gathr documentation, please email us!

Data Cleansing Processor

Processor Configuration #

S3 Connection Configuration #

RDS Connection Configuration #

Additional Configurations #

Processor Configuration

S3 Connection Configuration

RDS Connection Configuration

Additional Configurations