Join Processor

A join processor can create a relation between two incoming messages (DataSets).

Join Processor Configuration

To add a Join Processor into your pipeline, drag the processor on the canvas and right click on it to configure.

Join Condition

First tab is Join Condition. It allows to define the Join Type between two DataSets where two columns from both DataSets can have a Join condition applied to them.

Second Tab is Broadcast Table, it guides Spark to broadcast the specified table, when joining them with another table.

The available Join Types are:

Field	Description
Equi	The Equi Join performs a Join against equality or matching column(s) values of the associated tables. In Gathr, if an Equi Join is applied, it gets rendered in Filter Condition. If you want to apply a Join on two columns of the same table, you can use an Equi join for the same.
Full Outer	The full outer join is the solution when we need to match all corresponding data and include the rows without matches from both DataSets. In other words, it contains all, eventually merged, data.
Left Outer	In Left Outer join, all data from the left dataset is contained in joined dataset. The rows that have matches in right dataset are enriched with appropriated information while the ones without the matches, have this information set to null.
Right Outer	The right outer join is a variation of left outer join. It matches all data from right dataset to entries from the left dataset - even if some of matches are missing
Left Semi	When the left semi join is used, all rows from the left dataset having their correspondence in the right dataset are returned in the final result. However, unlike left outer join, the result doesn’t contain merged data from both datasets. Instead, it contains only the information (columns) brought by the left dataset.
Left Anti	The last described type is left anti join. It takes all rows from the left dataset that don’t have their matching in the right dataset.
Inner	It joins rows only if they’ve correspondence in both DataSets.

Filter Condition

Second Tab is Filter Condition, It allows to apply a filter condition to the DataSets. You can apply AND and OR operations wit two filter conditions.

The Filter Condition that can be applied are as follows:

contains
not contains
begins_with
ends_with
equal
not_begins_with
not_ends_with
is_null
is_not_null
in
not_in
matches
custom
not_equal

Join Projection

Third tab is Join Projection where you can apply a query and expressions to the columns.

Field	Description
Add Column/Expression	Select a column or expression that you want to apply the query to.
Value	Selected Columns will be reflected here.
View Query	View the query in the box shown.
ADD ALL COLUMNS/REMOVE COLUMNS	Add or Remove the columns.

If you have any feedback on Gathr documentation, please email us!

Join Processor

Join Processor Configuration #

Join Condition #

Filter Condition #

Join Projection #

Join Processor Configuration

Join Condition

Filter Condition

Join Projection