Register Spark ML Model

Steps to register a Decision Tree Model built on Spark Pipeline API for a classification use-case.

Step 1: Train the model on sample dataset and save it at any location on HDFS or local file system.

Step 2: Click on Register Model tab and provide a model name.

Step 3: Provide model API that is ML in our case.

Step 4: Provide the model type i.e. Decision Tree.

Step 5: Provide the scope of the Model.

Step 6: Mention if the model to be registered should be a Spark Pipeline model or not.

Step 7: Mention model algorithm i.e. Classification

Step 8: Select model source. If model is stored on locally, then select Local and if model is stored on HDFS then select HDFS.

Below explained is a table to show you the process flow with HDFS and Local.

Local	HDFS
Step 9: When you select Local, browse to the zip file of the model. Once the model is successfully uploaded, you will see a tick mark in green. Step 10: Click on Validate to validate the model. Step 11: Once validation is successful, a new tab will be shown: Save Model On. Choose HDFS or Gathr Database. If you choose HDFS, then provide the HDFS Connection Name and Model Path where the model will be saved. Step 12: Register the saved model on the Model Path by clicking on Register.	Step 9: When you select HDFS, provide the HDFS connection name. Step 10: Provide the Model Path where the model is located. Step 11: Validate the model Step 12: Register the model.

Local

HDFS

Step 9: When you select Local, browse to the zip file of the model. Once the model is successfully uploaded, you will see a tick mark in green.

Step 10: Click on Validate to validate the model.

Step 11: Once validation is successful, a new tab will be shown: Save Model On. Choose HDFS or Gathr Database. If you choose HDFS, then provide the HDFS Connection Name and Model Path where the model will be saved.

Step 12: Register the saved model on the Model Path by clicking on Register.

Step 9: When you select HDFS, provide the HDFS connection name.

Step 10: Provide the Model Path where the model is located.

Step 11: Validate the model

Step 12: Register the model.

Field	Description
Name	Provide a unique name of the model to be registered.
Model API	Spark API on which the model is built.
Feature List	The Feature list field enlists the features that are used to train the model. You can specify the feature names either by entering them manually or by uploading .csv file. Here, the header row of the file will be used for feature names.
Model Type	Types of models supported for the chosen Spark API. Choose one that fits your use case.
Pipeline Model	If you have trained the model using Spark ML Pipeline API then select pipeline model otherwise un-check this option and register.
Scope	The user can select either Projects or Workspace as the scope of the model that is to be registered. Note: The user can define the scope of the Model by selecting either Project or Workspace. If user selects workspace then, the created Model can be used across the Workspace. However, if the user selects Project as scope, then the Model will be visible only in the specific project.
Model Algorithm	Available for ML Models only. Algorithm for the selected Model Type. (only available for Decision Tree, Random Forest, and Gradient Boosted Tree, where they will populate Classification and Regression algorithm type.)
Model Source	Option to upload model or select HDFS, DBFS, ADLS, S3 where the model is available.

If Model Source is selected as HDFS, then additional parameters will get displayed:


Connection Name	Choose the HDFS connection name.
Override Credential	Option to override credentials for HDFS connection.
Path	Browse the path where the model is stored on HDFS.
Validate	Validates the uploaded model or located at the given HDFS location.

If Model Source is selected as DBFS, then additional parameter will get displayed:


Path	Browse the path where the model is stored. Note: The DBFS option will be available for azure environment.

Path

Browse the path where the model is stored.

Note: The DBFS option will be available for azure environment.

If Model Source is selected as ADLS, then additional parameters will get displayed:


Connection Name	Select the ADLS connection name for creating the connection.
Container Name	Provide the ADLS container name.
Path	Browse the path where the model is stored.

If Model Source is selected as S3, then additional parameters will get displayed:


Connection	Select S3 connection name for creating the connection.
Bucket Name	Select the S3 bucket name for creating the connection.
Path	Browse the path where the model is stored.

Once, the model is validated successfully, the Register button next to Validate will be enabled.

Click Register. After the model is registered successfully, you can view the model in the models page.

When the model is successfully registered with the application, you can avail it in your data pipelines.

If you have any feedback on Gathr documentation, please email us!