Distinct Processor

Distinct is a core operation of Apache Spark over streaming data. The Distinct processor is used for eliminating duplicate records of any dataset.

Processor Configuration

Enter the fields on which distinct operation is to be performed.

Click on the NEXT button. Enter the notes in the space provided.

Click SAVE for saving the configuration details.

👉

Distinct can’t be used right after Aggregation and Pivot processors.

Example to demonstrate how distinct works.

If you apply Distinct on any two fields: Name and Age, then the output for the given fields will be as shown below:

Input Set
{Name:Mike,Age:7}
{Name:Rosy,Age:9}
{Name:Jack,Age:5}
{Name:Mike,Age:6}
{Name:Rosy,Age:9}
{Name:Jack,Age:5}
Output Set
{Name:Mike,Age:7}
{Name:Mike,Age:6}
{Name:Rosy,Age:9}
{Name:Jack,Age:5}

If you have any feedback on Gathr documentation, please email us!

Distinct Processor

Processor Configuration #

Processor Configuration