PMML Models

PMML stands for “Predictive Model Markup Language”. It is the de facto standard to represent predictive solutions. A PMML file may contain a myriad of data transformations (pre and post-processing) as well as one or more predictive models.

Its structure follows a set of pre-defined elements and attributes which reflect the inner structure of a predictive workflow: data manipulations followed by one or more predictive models.

Below mentioned is the list of PMML Models available in Gathr.


Logistic Regression

Logistic Regression is a classification algorithm. It is used to describe data and to explain the relationship between one dependent binary variable and one or more independent variables. It is used to predict a binary outcome like 1 or 0, yes or no, true or false given a set of independent variables.

Logistic Regression analytics processor will be used to perform prediction on the incoming data by using a PMML.

An existing PMML file can uploaded to the processor or a PMML can by created using the processor.


Configuring Logistic Regression Model

To add Logistic Regression model into your pipeline, drag Logistic Regression model to the canvas and right click on it to configure.

The Configuration Settings of Logistic Regression model are as follows:

FieldDescription
SchemaSelect the message on which Logistics Regression is to be applied.
Import PMML

Enables to import PMML file.

Yes: when the Model is Imported, four tabs are enabled – Import and Download, which are to be followed in sequence. Also the model is validated and a success message is displayed or if the model is invalid a failure message will be displayed.

No: If the Model is not imported and created, Download the Model option is enabled.

ImportImport a PMML file.
Validate ModelChecks the validity of the model. Validating a model is mandatory before viewing or testing the model.
DOWNLOADDownload the PMML Model that was created or imported.
Add ConfigurationConfigure additional parameters in Key – Value pair.

ConfigWindow1

Logistic Regression can be processed in two ways.


Import PMML model

Choose Import Model as Yes and click the Import button to load the PMML file.

Once the model is imported, following checks will be performed on the Model:

  • The imported file must be a valid PMML file and it must have Logistic Regression model.

  • The features and output defined in the model must be present in the selected message with the data type expected by model.

If the check fails, then the model will not be loaded in the processor and save features will be disabled. If above checks are met, then model can be saved.

Notes:

  • If the PMML model imported is not as per the analytics processor chosen, then validation of the model will throw an error.

    • All the values on screen will be read-only since an imported model cannot be edited.

    • If model variables are not defined in the message, they appear in red. This is to highlight that these fields need to be defined in the message, to consume the model in Gathr, as shown below:

    VariableType

    Since PMML model is imported, the Variables cannot be edited.

If you want to classify the model output i.e. probability, then specify the threshold parameter, Low and High Classifiers under Variable Type. Threshold parameter takes a numeric value. The output value of the model is compared with the threshold, if the output is greater than threshold then the High classifier appear as output otherwise the Low classifier will appear.

After viewing the Variable Type, Click Next and Model Coefficients Page will open.

Click on Load Defined Variables to load the Variables and provide values against variables.

ModelCoeff

Now you can Test your Model with Values.

ModelTest

Once you have tested the mode, provide your notes in the add notes section and save the configuration.

Other option is to create your own PMML model.


Create PPML Model

If you want to create a PMML Logistic Regression model, follow the steps mentioned below:

Select a message from Message and select Import PMML as No.

config

Select Next, it will take you to Variable Type tab.

VariableType

Select the Input Variables, which is Continuous and Categorical Variables.

Provide a name to the Predicted Variable (Output) and provide Class labels for output values and a Threshold value.

Click Next to view the Model Coefficients. All the model features defined on Variables Type screen can be used in Model Coefficients.

The screenshot above represents the generic formula for Logistic Regression, where P0 represents model Intercept and PiXi represents combination of Model Coefficient and Model feature respectively.

You can load all defined model features using Load Defined Variables link or choose one at a time from the list. You can specify the respective coefficients of the feature under coefficients column (PO-P8). If you are aware of the number of model variables to be used in model, specify the number in formula (above Sigma symbol) and those many rows will be automatically loaded on screen. In each row, you must specify the coefficient next to Pi and respective feature next to Xi, where i is the row number. Apart from features, a combination of continuous model features i.e. Interaction terms can also be defined as Xi.

ModelCoeff

Provide values to Probability and proceed to testing the Model.

Test the Model

Once the model is loaded, you can test the model with Model Test tab. Click Load Model to load and view the model.

loadDefinedvariables

Specify value of all model features and perform a single record test. The system will evaluate the model for this input and will show all the output parameters on screen.

ModelTest

Add Notes and save the Configuration.


Regression

Regression analytics processor is used to analyze data through Regression model. Regression analysis is used for estimating the relationship among variables. It helps to identify how the value of dependent variable changes when any one of the independent variable is changed, while other independent variables are fixed. It is used for prediction and forecasting.


Configuring Regression Model

To add Regression model into your pipeline, drag Regression model to the canvas and right click on it to configure.

The configuration settings of Regression model are as follows:

FieldDescription
SchemaSelect the message on which Regression algorithm is to be applied.
Import PMML

Enables to import PMML file.

Yes: When the Model is Imported, four tabs are enabled – Import, Validate Model, View Model and Download, which are to be followed in sequence.

No: If the Model is not imported and created, download the Model option is enabled.

Validate ModelChecks the validity of the model. Validating a model is mandatory before Viewing or Testing the model. 
DownloadDownload the PMML Model that was created or imported
Add ConfigurationConfigure additional parameters in Key – Value pair

Regression can be processed in two ways:


Import PMML Model

config1

Choose Import Model as Yes and click the Import button to load the PMML file.

Once the model is imported, a message will be displayed validating if the model is valid or not.

Following checks will be performed:

  • The imported file must be a valid PMML file and it must have Regression model. If this check fails, then the model will not be loaded in the processor, View Model and Save features are also disabled.

  • The features and output defined in the model must be present in the selected message with the data type expected by model. If the message does not have certain attributes defined in model, then the View Model feature will be enabled but the model still cannot be saved. The error message in this case will explain which fields need to be defined.

When you click on next, Variable Type Tab opens the Variables as shown below:

createnewModel2

Since PMML model is imported, the Variables cannot be edited.

After viewing the Variable Type, click Next and Model Coefficients page will open.

Click on Load Defined Variables to load the variables and provide values against variable.

createnewModel3

Now, test your Model with Values.

regression_Success

After the Model is tested, provide your notes in the Add Notes section and save the configuration.

Other option is to create your own PMML model.


Create PMML Model

If you want to create a PMML Regression model, choose Import PMML as No.

createnewModel

Variable Type

Variable Type and Model Coefficients tabs are enabled.

You must select the message field this variable corresponds to, along with the possible categories.

You can also use upload CSV option to populate categories for categorical variables under Categorical Variables via Add Variables, as shown below:

createnewModel2

Model Coefficients

All the model features defined on Variable Type can be used on Model Coefficients page. The screenshot below represents the generic formula for Regression.

createnewModel3

When you click on Next, you can test your model with values.

createnewModel4

Once you have tested the mode, provide your notes in the Add Notes section and save the configuration.


Cluster Model

Cluster Model Analytics processor is used to analyze data through Cluster model. It is commonly known as data clustering.

Data clustering is the task of diving a dataset into subset of similar items. Applying data clustering to a dataset generates group of similar data items. These groups are called clusters i.e. collection of similar data items.

Data clustering can help you identify, learn or predict the nature of new data items especially how new data can be linked for making predictions.

For example, in pattern recognition, analyzing pattern in the data like buying pattern in particular region or age group can assist you develop predictive analysis.


Configuring Cluster Model

To add ClusterModel into your pipeline, drag ClusterModel to the canvas and right click on it to configure.

FieldDescription
MessageSelect the message on which ClusterModel algorithm has to be applied.
Import PMML

Enables to import PMML file.

Yes: when the Model is Imported, four tabs are enabled – Import, Validate Model, View Model and Download, which are to be followed in sequence.

No: If the Model is not imported and created, Download the Model option is enabled.

ClusteringClustering allows you to define various model features i.e. continuous variables, along with output variables, distance measure parameter and clustering parameters such as weight, cluster.
Validate ModelChecks the validity of the model. Validating a model is mandatory before Viewing or Testing the model.
DownloadDownloads PMML file created either using Gathr UI or Import option.
Add ConfigurationConfigure additional parameters in Key – Value pair.

ClusterModel can be processed in two ways:


Import an existing model

Choose import model as Yes and click the Import button to load the PMML file.

Once the model is imported, click on Validate Model.

ClusterModel1

Checks will be performed on the model and if the defined variables does not match with the model then you will receive an error.


Create your own model through Gathr UI

If you have a ClusterModel definition and wish to create the model in Gathr. Choose import model as No and then clustering tab will be enabled.

You can add new cluster by clicking on add cluster link (plus icon)

ClusterModel2

Test the model

Once the model is loaded (through UI or import) then you can test the model from Model Test screen. Click Load Model to load and view the Model.

Specify value to all model features and perform a single record test through Test Single Record. The system will evaluate the model for this input and will show all the output parameters on screen.

ClusterModel3


SupportVectorMachine

SupportVectorMachine analytics processor is used to analyze data through SupportVectorMachine model. SupportVectorMachine is a machine-learning algorithm that can be used for both classification and regression challenges. However, it is mostly used in classification problems.


Configuring SupportVectorMachine Model

To add SupportVectorMachine model into your pipeline, drag SupportVectorMachine model to the canvas and right click on it to configure.

FieldDescription
SchemaSelect the message on which SupportVectorMachine algorithm has to be applied.
Import PMML

Enables to import PMML file.

Yes: when the Model is Imported, following tabs are enabled – Import and Download.

Add ConfigurationConfigure additional parameters in Key – Value pair. 
Validate ModelChecks the validity of the model. Validating a model is mandatory before Viewing or Testing the model.

SVM Processor can be processed in the following way:


Import PMML Model

If you have a PMML file representing SupportVectorMachine then import it. Click on Import button to load the file.

SVM1

Once the model is imported, model validation with variable checks will be performed.

The next tab that is enabled is Test Model.

Test Model

You can test the model by clicking on Model Test Tab. Click Load Model to load and view the Model.

SVM2

Specify value to all model features and perform a single record test through Test Single Record. The system will evaluate the model for this input and will show all the output parameters on screen.


NaiveBayes

NaiveBayes analytics processor is used to analyze data through NaïveBayes model. NaiveBayes model is easy to build and particularly useful for large data sets.

NaiveBayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered an apple if it is red, round, and about 4 inches in diameter. Even if these features are dependent on each other or upon the existence of the other features, all of these attributes individually, contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

A Naive Bayesian model is simple to build, with no complex iterative parameter estimation that makes it specifically useful for very large datasets.

FieldDescription
MessageSelect the message from the drop-down list on which analytics algorithm has to be applied.
ImportEnables to import the PMML file
Validate ModelChecks the validity of imported PMML file.
DownloadDownloads PMML file created either using Gathr UI or Import option.
Add ConfigurationConfigure additional parameters in Key-Value pair.

Input to this processor can only be provided as:

  1. If you have a PMML file representing NaiveBayes, then you can directly import it. To do so click on Import button to load the file.

    NaiveBayes1

    Once the model is imported, it should be validated. Following checks will be performed:

    • The imported file must be a valid PMML file and it must have NaiveBayes model. If this check fails, then the model will not be loaded in the processor and Save feature will be disabled.

    • The features and output defined in the model must be present in the selected message with the data type expected by model.

    If the message does not have certain attributes defined in model then model cannot be saved, since without these features in message you will not able to execute the processor. The error message in this case will explain which fields needs to be defined.

    Save the model once, the checks are met.

  2. You can Download PMML file that you have imported.

  3. Once the model is loaded, test the model using Model Test tab.

Test Model

Click on the Model Test tab. Click on Load Model button for loading the model fields.

Specify value for all the model features and perform a single record test by clicking on TEST SINGLE RECORD button. The system will evaluate the model for this input and will show all the output parameters on screen.

NaiveBayes2

Add notes and save the configuration.


Ensemble

This analytics processor is used to analyze data through Ensemble model. Ensemble modeling is the process of running two or more related but different analytical models and then combining the results into a single score, which helps in improving the accuracy of predictive analytics and data mining applications.

Ensemble modeling offers one of the most convincing way to build highly accurate predictive models. Ensemble model combines multiple models together and delivers superior prediction power.


Configuring Ensemble Model

To add Ensemble model into your pipeline, drag Ensemble model to the canvas and right click on it to configure.

FieldDescription
MessageSelect the message from the drop-down list on which analytics algorithm has to be applied.
ImportEnables to import the PMML file
Validate ModelChecks the validity of imported PMML file.
DownloadDownloads PMML file created either using Gathr UI or Import option.
Add ConfigurationConfigure additional parameters in Key – Value pair. 

Input to this processor can only be provided as:

If you have a PMML file representing Ensemble, then you can import it. Click on Import button to load the file.

Ensemble1

Once the model is imported, it is validated. Following checks will be performed:

Ensemble2

The imported file must be a valid PMML file and it must have Ensemble model. If this check fails, then the model will not be loaded in the processor and Save** features will be disabled.

The features and output defined in the model must be present in the selected message with the data type expected by model. In case the message does not have certain attributes defined in model, the error message will explain you, which fields needs to be defined.

If above checks are met, then model can be saved for execution.

Once the model is loaded, you can test the model by clicking on Model Test tab.

If you click Load Model link, model features will appear on screen.

Add notes and save the configuration.


Neural Network

This analytics processor is used to analyze data through Neural Network model. A neural network is a powerful computational data model that is able to capture, represent complex input and output relationships.

Neural Networks are widely used for data classification, process past and current data to estimate future values.


Configuring Neural Network Model

To add Neural Network model into your pipeline, drag Neural Network model to the canvas and right click on it to configure.

FieldDescription
SchemaSelect the message from the drop-down list on which Analytics algorithm has to be applied.
Import

Enables to import PMML file.

When the Model is Imported, two tabs are enabled – Import and Download.

DownloadDownloads PMML file created either using Gathr UI or Import option.
Add ConfigurationConfigure additional parameters in Key–Value pair.

Neural1

Input to this processor can only be provided as:

  • If you have a PMML file, representing Neural Network then you can directly import it. Click the Import button to load the file.

Once the model is imported, it shall be validated.

Once it is validated, you can click on View Model.

NeuralNetworks

The Neural Network tab shows the graphical representation of the model. If you click on any weight or edge, you will be able to view the corresponding data.

We are able to visualize the Neutral Network. Weight specific to every edge is highlighted when you click on it.

Test Model

You can test the model by clicking on Model Test Tab. Click on Load Model button for loading the model fields.

NeuralNetworks3

Specify value for all the model features and perform a single record test by clicking on TEST SINGLE RECORD button. The system will evaluate the model for this input and will show all the output parameters on screen.


Tree Model

This analytics processor is used to analyze data through Tree model. Tree based learning algorithms are considered to be one of the best learning methods.

This model provides high accuracy, stability and ease of interpretation. They are adaptable at solving either classification or regression problems.

Configuring Tree Model

To add Tree Model into your pipeline, drag Tree Model to the canvas and right click on it to configure.

treemodel1

FieldDescription
SchemaSelect the message from the drop-down list on which Analytics algorithm has to be applied.
ImportWhen the Model is Imported, two tabs are enabled – Import and Download.
DownloadDownloads PMML file created either using Gathr UI or Import option.
Add ConfigurationConfigure additional parameters in Key – Value pair. 

Input to this processor can only be provided as:

  1. If you have a PMML file, representing Tree then you can directly import it. To do so you must choose Import Model as Yes and then click the Import button to load the file.

Once the model is imported, it is validated.

Tree Model tab is enabled and you can view Tree diagram of the model, as shown below.

treemodel2

After the Tree Model view, you can then test the Model.

Test Model

You can test the model by clicking on Model Test tab.

Model Test tab

Specify value for all the model features and perform a single record test by clicking on TEST SINGLE RECORD button. The system will evaluate the model for this input and will show all the output parameters on screen.

Top