Register Spark ML Model

Steps to register a Decision Tree Model built on Spark Pipeline API for a classification use-case.

Step 1: Train the model on sample dataset and save it at any location on HDFS or local file system.

Step 2: Click on Register Model tab and provide a model name.

Step 3: Provide model API that is ML in our case.

Step 4: Provide the model type i.e. Decision Tree.

Step 5: Provide the scope of the Model.

Step 6: Mention if the model to be registered should be a Spark Pipeline model or not.

Step 7: Mention model algorithm i.e. Classification

Step 8: Select model source. If model is stored on locally, then select Local and if model is stored on HDFS then select HDFS.

Below explained is a table to show you the process flow with HDFS and Local.

LocalHDFS

Step 9: When you select Local, browse to the zip file of the model. Once the model is successfully uploaded, you will see a tick mark in green.

Step 10: Click on Validate to validate the model.

Step 11: Once validation is successful, a new tab will be shown: Save Model On. Choose HDFS or Gathr Database. If you choose HDFS, then provide the HDFS Connection Name and Model Path where the model will be saved.

Step 12: Register the saved model on the Model Path by clicking on Register.

Step 9: When you select HDFS, provide the HDFS connection name.

Step 10: Provide the Model Path where the model is located.

Step 11: Validate the model

Step 12: Register the model.

FieldDescription
NameProvide a unique name of the model to be registered.
Model APISpark API on which the model is built.
Feature ListThe Feature list field enlists the features that are used to train the model. You can specify the feature names either by entering them manually or by uploading .csv file. Here, the header row of the file will be used for feature names.
Model TypeTypes of models supported for the chosen Spark API. Choose one that fits your use case.
Pipeline ModelIf you have trained the model using Spark ML Pipeline API then select pipeline model otherwise un-check this option and register.
Scope

The user can select either Projects or Workspace as the scope of the model that is to be registered.

Note: The user can define the scope of the Model by selecting either Project or Workspace. If user selects workspace then, the created Model can be used across the Workspace. However, if the user selects Project as scope, then the Model will be visible only in the specific project.

Model AlgorithmAvailable for ML Models only. Algorithm for the selected Model Type. (only available for Decision Tree, Random Forest, and Gradient Boosted Tree, where they will populate Classification and Regression algorithm type.)
Model SourceOption to upload model or select HDFS, DBFS, ADLS, S3 where the model is available.

If Model Source is selected as HDFS, then additional parameters will get displayed:

Connection NameChoose the HDFS connection name.
Override CredentialOption to override credentials for HDFS connection.
PathBrowse the path where the model is stored on HDFS.
ValidateValidates the uploaded model or located at the given HDFS location.

If Model Source is selected as DBFS, then additional parameter will get displayed:

Path

Browse the path where the model is stored.

Note: The DBFS option will be available for azure environment.

If Model Source is selected as ADLS, then additional parameters will get displayed:

Connection NameSelect the ADLS connection name for creating the connection.
Container NameProvide the ADLS container name.
PathBrowse the path where the model is stored.

If Model Source is selected as S3, then additional parameters will get displayed:

ConnectionSelect S3 connection name for creating the connection.
Bucket NameSelect the S3 bucket name for creating the connection.
PathBrowse the path where the model is stored.

Once, the model is validated successfully, the Register button next to Validate will be enabled.

Click Register. After the model is registered successfully, you can view the model in the models page.

When the model is successfully registered with the application, you can avail it in your data pipelines.

Top