XML Parser Processor

To add an XML Parser processor to your pipeline, drag the processor onto the canvas and click on it to configure.

FieldDescription
EvaluationSelect option for XML parse and validation. Choose one of the below options:

- XPATH

- XSD Validation

- XSD Validation And Drop Invalid XML Tags.

XPATH: If the user selects XPATH option, fill the below fields:

XPATHProvide value for XPATH.
XML Data ColumnProvide value for XML Data Column.
Include Input XML ColumnThe user can opt to include the input XML data in extra output column.
Conflicted Array ColumnsComplete path of conflicted columns that need to be changed from string to array type. Example: parent.child.grandchild

Select one of the options available in the drop-down list as per user’s requirement:

  • Always

  • Only with Invalid XML

  • Only with Valid XML

  • Never

XSD Validation: If the user selects XSD option, fill the below fields:

XSD SourceThe user has options to provide the XSD source file either by selecting the HDFS or Upload XSD from the drop-down list.

If the user selects HDFS option, fill the below fields:

Connection NameProvide the connection name for creating connection. User can select default connection.
HDFS PathProvide the HDFS file path.

Note: The HDFS path should include XSD file name.

If XSD file has an import statement, in that case the imported file must be at parallel path with extention .xsd
Error ColumnUser can optionally add a new column that will contain error/null values.
Input XML Data ColumnSelect the Input XML Data Column from the drop-down list.
Output XML Data ColumnSelect the Output XML Data Column from the drop-down list.

Note: User can add further configurations by clicking at the ADD CONFIGURATION button.

If the user selects Upload XSD option, then Upload XSD file and provide Error Column, Input XML Data Column and Output XML Data Column details. User can also add further configurations by clicking at the ADD CONFIGURATION option.

XSD Validation And Drop Invalid XML Tags: If the user selects XSD Validation And Drop Invalid XML Tags option, fill the below fields:

XSD SourceThe user has options to provide the XSD source file either by selecting the HDFS or Upload XSD from the drop-down list.

If the user selects HDFS option, fill the below fields:

Connection NameProvide the connection name for creating connection. User can select default connection.
HDFS PathProvide the HDFS file path.

Note: The HDFS path should include XSD file name.

If XSD file has an import statement, in that case the imported file must be at parallel path with extention.xsd
Error ColumnUser can optionally add a new column that will contain error/null values.
Input XML Data ColumnSelect the Input XML Data Column from the drop-down list.
Output XML Data ColumnSelect the Output XML Data Column from the drop-down list.

Note: User can add further configurations by clicking at the ADD CONFIGURATION button.

If the user selects Upload XSD option, then Upload XSD file and provide Error Column, Input XML Data Column and Output XML Data Column details. User can also add further configurations by clicking at the ADD CONFIGURATION option.

Top