Register Spark ML Model
Steps to register a Decision Tree Model built on Spark Pipeline API for a classification use-case.
Step 1: Train the model on sample dataset and save it at any location on HDFS or local file system.
Step 2: Click on Register Model tab and provide a model name.
Step 3: Provide model API that is ML in our case.
Step 4: Provide the model type i.e. Decision Tree.
Step 5: Provide the scope of the Model.
Step 6: Mention if the model to be registered should be a Spark Pipeline model or not.
Step 7: Mention model algorithm i.e. Classification
Step 8: Select model source. If model is stored on locally, then select Local and if model is stored on HDFS then select HDFS.
Below explained is a table to show you the process flow with HDFS and Local.
Local | HDFS |
---|---|
Step 9: When you select Local, browse to the zip file of the model. Once the model is successfully uploaded, you will see a tick mark in green. Step 10: Click on Validate to validate the model. Step 11: Once validation is successful, a new tab will be shown: Save Model On. Choose HDFS or Gathr Database. If you choose HDFS, then provide the HDFS Connection Name and Model Path where the model will be saved. Step 12: Register the saved model on the Model Path by clicking on Register. | Step 9: When you select HDFS, provide the HDFS connection name. Step 10: Provide the Model Path where the model is located. Step 11: Validate the model Step 12: Register the model. |
Field | Description |
---|---|
Name | Provide a unique name of the model to be registered. |
Model API | Spark API on which the model is built. |
Feature List | The Feature list field enlists the features that are used to train the model. You can specify the feature names either by entering them manually or by uploading .csv file. Here, the header row of the file will be used for feature names. |
Model Type | Types of models supported for the chosen Spark API. Choose one that fits your use case. |
Pipeline Model | If you have trained the model using Spark ML Pipeline API then select pipeline model otherwise un-check this option and register. |
Scope | The user can select either Projects or Workspace as the scope of the model that is to be registered. Note: The user can define the scope of the Model by selecting either Project or Workspace. If user selects workspace then, the created Model can be used across the Workspace. However, if the user selects Project as scope, then the Model will be visible only in the specific project. |
Model Algorithm | Available for ML Models only. Algorithm for the selected Model Type. (only available for Decision Tree, Random Forest, and Gradient Boosted Tree, where they will populate Classification and Regression algorithm type.) |
Model Source | Option to upload model or select HDFS, DBFS, ADLS, S3 where the model is available. |
If Model Source is selected as HDFS, then additional parameters will get displayed:
Connection Name | Choose the HDFS connection name. |
Override Credential | Option to override credentials for HDFS connection. |
Path | Browse the path where the model is stored on HDFS. |
Validate | Validates the uploaded model or located at the given HDFS location. |
If Model Source is selected as DBFS, then additional parameter will get displayed:
Path | Browse the path where the model is stored. Note: The DBFS option will be available for azure environment. |
If Model Source is selected as ADLS, then additional parameters will get displayed:
Connection Name | Select the ADLS connection name for creating the connection. |
Container Name | Provide the ADLS container name. |
Path | Browse the path where the model is stored. |
If Model Source is selected as S3, then additional parameters will get displayed:
Connection | Select S3 connection name for creating the connection. |
Bucket Name | Select the S3 bucket name for creating the connection. |
Path | Browse the path where the model is stored. |
Once, the model is validated successfully, the Register button next to Validate will be enabled.
Click Register. After the model is registered successfully, you can view the model in the models page.
When the model is successfully registered with the application, you can avail it in your data pipelines.
If you have any feedback on Gathr documentation, please email us!