Join Processor

A join processor can create a relation between two incoming messages (DataSets).

Join Processor Configuration

To add a Join Processor into your pipeline, drag the processor on the canvas and right click on it to configure.

Join Condition

First tab is Join Condition. It allows to define the Join Type between two DataSets where two columns from both DataSets can have a Join condition applied to them.

Second Tab is Broadcast Table, it guides Spark to broadcast the specified table, when joining them with another table.

The available Join Types are:

FieldDescription
EquiThe Equi Join performs a Join against equality or matching column(s) values of the associated tables. In Gathr, if an Equi Join is applied, it gets rendered in Filter Condition. If you want to apply a Join on two columns of the same table, you can use an Equi join for the same.
Full OuterThe full outer join is the solution when we need to match all corresponding data and include the rows without matches from both DataSets. In other words, it contains all, eventually merged, data.
Left OuterIn Left Outer join, all data from the left dataset is contained in joined dataset. The rows that have matches in right dataset are enriched with appropriated information while the ones without the matches, have this information set to null.
Right OuterThe right outer join is a variation of left outer join. It matches all data from right dataset to entries from the left dataset - even if some of matches are missing
Left SemiWhen the left semi join is used, all rows from the left dataset having their correspondence in the right dataset are returned in the final result. However, unlike left outer join, the result doesn’t contain merged data from both datasets. Instead, it contains only the information (columns) brought by the left dataset.
Left AntiThe last described type is left anti join. It takes all rows from the left dataset that don’t have their matching in the right dataset.
InnerIt joins rows only if they’ve correspondence in both DataSets.

Filter Condition

Second Tab is Filter Condition, It allows to apply a filter condition to the DataSets. You can apply AND and OR operations wit two filter conditions.

The Filter Condition that can be applied are as follows:

  • contains

  • not contains

  • begins_with

  • ends_with

  • equal

  • not_begins_with

  • not_ends_with

  • is_null

  • is_not_null

  • in

  • not_in

  • matches

  • custom

  • not_equal

Join Projection

Third tab is Join Projection where you can apply a query and expressions to the columns.

FieldDescription
Add Column/ExpressionSelect a column or expression that you want to apply the query to.
ValueSelected Columns will be reflected here.
View QueryView the query in the box shown.
ADD ALL COLUMNS/REMOVE COLUMNSAdd or Remove the columns.
Top