XML Parser Processor
To add an XML Parser processor to your pipeline, drag the processor onto the canvas and click on it to configure.
Field | Description |
---|---|
Evaluation | Select option for XML parse and validation. Choose one of the below options: - XPATH - XSD Validation - XSD Validation And Drop Invalid XML Tags. |
XPATH: If the user selects XPATH option, fill the below fields:
XPATH | Provide value for XPATH. |
XML Data Column | Provide value for XML Data Column. |
Include Input XML Column | The user can opt to include the input XML data in extra output column. |
Conflicted Array Columns | Complete path of conflicted columns that need to be changed from string to array type. Example: parent.child.grandchild |
Select one of the options available in the drop-down list as per user’s requirement:
Always
Only with Invalid XML
Only with Valid XML
Never
XSD Validation: If the user selects XSD option, fill the below fields:
XSD Source | The user has options to provide the XSD source file either by selecting the HDFS or Upload XSD from the drop-down list. |
If the user selects HDFS option, fill the below fields:
Connection Name | Provide the connection name for creating connection. User can select default connection. |
HDFS Path | Provide the HDFS file path. Note: The HDFS path should include XSD file name. If XSD file has an import statement, in that case the imported file must be at parallel path with extention .xsd |
Error Column | User can optionally add a new column that will contain error/null values. |
Input XML Data Column | Select the Input XML Data Column from the drop-down list. |
Output XML Data Column | Select the Output XML Data Column from the drop-down list. Note: User can add further configurations by clicking at the ADD CONFIGURATION button. |
If the user selects Upload XSD option, then Upload XSD file and provide Error Column, Input XML Data Column and Output XML Data Column details. User can also add further configurations by clicking at the ADD CONFIGURATION option.
XSD Validation And Drop Invalid XML Tags: If the user selects XSD Validation And Drop Invalid XML Tags option, fill the below fields:
XSD Source | The user has options to provide the XSD source file either by selecting the HDFS or Upload XSD from the drop-down list. |
If the user selects HDFS option, fill the below fields:
Connection Name | Provide the connection name for creating connection. User can select default connection. |
HDFS Path | Provide the HDFS file path. Note: The HDFS path should include XSD file name. If XSD file has an import statement, in that case the imported file must be at parallel path with extention.xsd |
Error Column | User can optionally add a new column that will contain error/null values. |
Input XML Data Column | Select the Input XML Data Column from the drop-down list. |
Output XML Data Column | Select the Output XML Data Column from the drop-down list. |
Note: User can add further configurations by clicking at the ADD CONFIGURATION button.
If the user selects Upload XSD option, then Upload XSD file and provide Error Column, Input XML Data Column and Output XML Data Column details. User can also add further configurations by clicking at the ADD CONFIGURATION option.
If you have any feedback on Gathr documentation, please email us!