Repartition Processor

The Repartition operator changes how the ingestion data is partitioned by dividing large datasets into multiple parts.

The Repartition operator is used when you want to increase or decrease the parallelism in an executor. The number of parallelism maps to number of tasks running in an executor. It creates either more or fewer partitions to balance data across them and shuffles data over the network.

You can use single partition if you need to write data in a single file and multiple partitions for writing data in multiple files.

Processor Configuration

Partition By: The user can select the repartition type either on the basis of number, column or expression.

Number: Upon selecting Number as an option to repartition, the user can enter value for the number of executors (threads) of an operator or channel.

Column

Partition Columns: Upon selecting Column as an option to repartition, the user requires to select the columns/fields from on which the partition is to be done.

Partition Number: Enter the value for number of executors (threads) of an operator/channel.

Expression

Partition Expression: Upon selecting Expression as an option to repartition, the user should enter the expression value according to which repartition is to be done.

Partition Number: Enter value for the number of executors (threads) of an operator/channel.

Top