K-Means Algorithm

K-Means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters. K-Means Analytics processor is used to analyze data using ML’s K-means Model.

To use a K-Means Model in Data Pipeline, drag and drop the model component to the pipeline and right click on it to configure.

The Configuration Section → of every ML model is identical.

After the Configuration tab comes the Feature Selection → tab. (It is identical for all the models except K Means).

Once Feature Selection is done, perform Pre-Processing → on the data before feeding it to the Model. The configuration settings are identical for all the ML models.

Then configure the Model using Model Configuration.

Model Configuration

Max Iterations: Number of Iterations for building ensemble of trees. Number of Output trees are equal to the max iterations specified. This acts as one of the stopping criteria for model training.

Init Step: Parameter for the number of steps for the k-means|| initialization mode. This is an advanced setting. Must be > 0.

Feature Column: Column name which will be treated as feature column while training a model.

Prediction Column: Set the columns to be predicted. Value of Prediction Column must be set as “prediction” in order to deploy the model as REST service.

Seed: Specify seed parameter value. This value will be used for model training.

Tolerance: Set the convergence tolerance of iterations. Smaller value leads to higher accuracy with the cost of more iterations.

Number of Clusters: Sets the number of clusters. Must be > 1.

Init Mode: Parameter for the initialization algorithm. This can be either “random” to choose random points as initial cluster centers, or “k-means||” to use a parallel variant of k-means++.

After Model Configuration, Post-Processing → is done, after which you can simply add notes and save the Configuration.

Top