Distinct Processor

Distinct is a core operation of Apache Spark over streaming data. The Distinct processor is used for eliminating duplicate records of any dataset.

Processor Configuration

Enter the fields on which distinct operation is to be performed.

Click on the NEXT button. Enter the notes in the space provided.

Click SAVE for saving the configuration details.

Example to demonstrate how distinct works.

If you apply Distinct on any two fields: Name and Age, then the output for the given fields will be as shown below:

Input Set
{Name:Mike,Age:7}
{Name:Rosy,Age:9}
{Name:Jack,Age:5}
{Name:Mike,Age:6}
{Name:Rosy,Age:9}
{Name:Jack,Age:5}
Output Set
{Name:Mike,Age:7}
{Name:Mike,Age:6}
{Name:Rosy,Age:9}
{Name:Jack,Age:5}
Top