Join Processor
A join processor can create a relation between two incoming messages (DataSets).
Join Processor Configuration
To add a Join Processor into your pipeline, drag the processor on the canvas and right click on it to configure.
Join Condition
First tab is Join Condition. It allows to define the Join Type between two DataSets where two columns from both DataSets can have a Join condition applied to them.
Second Tab is Broadcast Table, it guides Spark to broadcast the specified table, when joining them with another table.
The available Join Types are:
Field | Description |
---|---|
Equi | The Equi Join performs a Join against equality or matching column(s) values of the associated tables. In Gathr, if an Equi Join is applied, it gets rendered in Filter Condition. If you want to apply a Join on two columns of the same table, you can use an Equi join for the same. |
Full Outer | The full outer join is the solution when we need to match all corresponding data and include the rows without matches from both DataSets. In other words, it contains all, eventually merged, data. |
Left Outer | In Left Outer join, all data from the left dataset is contained in joined dataset. The rows that have matches in right dataset are enriched with appropriated information while the ones without the matches, have this information set to null. |
Right Outer | The right outer join is a variation of left outer join. It matches all data from right dataset to entries from the left dataset - even if some of matches are missing |
Left Semi | When the left semi join is used, all rows from the left dataset having their correspondence in the right dataset are returned in the final result. However, unlike left outer join, the result doesn’t contain merged data from both datasets. Instead, it contains only the information (columns) brought by the left dataset. |
Left Anti | The last described type is left anti join. It takes all rows from the left dataset that don’t have their matching in the right dataset. |
Inner | It joins rows only if they’ve correspondence in both DataSets. |
Filter Condition
Second Tab is Filter Condition, It allows to apply a filter condition to the DataSets. You can apply AND and OR operations wit two filter conditions.
The Filter Condition that can be applied are as follows:
contains
not contains
begins_with
ends_with
equal
not_begins_with
not_ends_with
is_null
is_not_null
in
not_in
matches
custom
not_equal
Join Projection
Third tab is Join Projection where you can apply a query and expressions to the columns.
Field | Description |
---|---|
Add Column/Expression | Select a column or expression that you want to apply the query to. |
Value | Selected Columns will be reflected here. |
View Query | View the query in the box shown. |
ADD ALL COLUMNS/REMOVE COLUMNS | Add or Remove the columns. |
If you have any feedback on Gathr documentation, please email us!