Join Processor

A join processor can create a relation between two incoming messages (Datasets).

Processor Configuration

The Join Processor can be configured in three steps.

Join Condition

On Join Condition tab, the first field is Join Condition. It allows to define the Join Type between two Datasets where two columns from both Datasets can have a Join condition applied to them.

Second field is Broadcast Table. It guides Spark to broadcast the specified table, when joining them with another table.

The available Join Types are:

Equi: The Equi Join performs a Join against equality or matching column(s) values of the associated tables. In Gathr, if an Equi Join is applied, it gets rendered in Filter Condition. If you want to apply a Join on two columns of the same table, you can use an Equi join for the same.

Inner: It joins rows only if they have correspondence in both Datasets.

Full Outer: The full outer join is the solution when we need to match all corresponding data and include the rows without matches from both Datasets. In other words, it contains all, eventually merged, data.

Left Anti: The last described type is left anti join. It takes all rows from the left dataset that don’t have their matching in the right dataset.

Left Outer: In Left Outer join, all data from the left dataset is contained in joined dataset. The rows that have matches in right dataset are enriched with appropriated information while the ones without the matches, have this information set to null.

Right Outer: The right outer join is a variation of left outer join. It matches all data from right dataset to entries from the left dataset - even if some of matches are missing

Left Semi: When the left semi join is used, all rows from the left dataset having their correspondence in the right dataset are returned in the final result. However, unlike left outer join, the result doesn’t contain merged data from both datasets. Instead, it contains only the information (columns) brought by the left dataset.

Join Filter Condition

The second tab in Join Processor is Filter Condition.

It allows to apply a filter condition to the Datasets. You can apply AND and OR operations with two filter conditions.

The Filter Condition that can be applied are as follows:

  • Equal

  • Not Equal

  • Less

  • Greater

  • Less or Equal

  • Greater or Equal

  • Between

  • Is Null

  • Is Not Null

  • In

  • Not In

  • Custom

Join Projection

The third tab in Join Processor is Projection. Here, you can apply a query and expressions to the columns.

Add Column/Expression: Select a column or expression that you want to apply the query to.

Value: Selected Columns will be reflected here.

View Query: View the query in the box shown.

ADD ALL COLUMNS/REMOVE COLUMNS: Add or Remove the columns.

Top