navio Models
Requirements
- Python 3.7+
- MLflow 1.20.1+
- pynavio 0.1.0+ (an open source library for navio)
- Docker (for local testing)
Creating Your First navio MLflow Model
info
Currently, navio only supports models wrapped with MLflow. More model formats will be added in the future.
tip
If you require more information regarding MLflow models and their usage, be sure to check out their extensive documentation.
navio supports almost any machine learning model written in Python irrespective of the deep learning libraries or frameworks you use to train your models. navio supports this by making use of MLflow, more specifically the python_function
model flavour which is the default model inference for MLflow Python models.
navio MLflow models are similar to the standard MLflow models with some key differences:
- Models must be uploaded as zip archives
- Models Require a JSON request schema provided within the model archive to help navi generate the API required for interacting the model
- Predictions must be generated following a specific format
- An MLmodel YAML file is included in the archive for specifying model metadata
The steps below provide outline the typical process tabular and time series models.
1. Create a model saving script
The model saving script must always define a predictor as a subclass of mlflow.pyfunc.PythonModel
.
For example, the script below creates a simple predictor which wraps a user-defined model, loadable using a custom utility module:
import mlflow
import pandas as pd
import joblib
class Predictor(mlflow.pyfunc.PythonModel):
_columns = ['x', 'y']
def load_context(self, context: mlflow.pyfunc.PythonModelContext) -> None:
self._model = joblib.load_model(context.artifacts['model'])
def _predict(self, df: pd.DataFrame) -> dict:
return {'prediction': self._model.predict(df[self._columns]).tolist()}
def predict(self, context: mlflow.pyfunc.PythonModelContext,
model_input: pd.DataFrame) -> dict:
return self._predict(model_input)
Predictor Class
predict()
The Predictor Class is required to implement the predict()
method which is used to evaluate queries. The predict()
method must adhere to the mlflow.pyfunc
Inference API.
predict()
method specifications:
This method must receive input in the format of a
pandas.DataFrame
.This method must return a dictionary, where a list of predictions are stored under the key 'prediction'.
- One entry for each row of input in the same order as the input rows.
- This applies regardless of the name of the target column as defined in the Request Schema.
Elements in the list of predictions should be of numeric or string type, not lists or numpy arrays.
Example:
For an input with 2 instances, the predict()
method output could look like this:
{
"prediction": ["setosa", "versicolor"]
}
Please see the Requesting Predictions from a Deployed Model section for the already deployed model's api response.
load_context()
Implementation of the load_context()
method is optional. Use this only if you need to load some artifacts once at model startup (e.g. loading model weights). These can then be used by the predict()
method when evaluating inputs.
other
Third party dependencies must be listed in the Conda Env YAML file.
Imported user-defined utility modules must be valid python module directories and must be specified via the
code_path
argument of themlflow.pyfunc.save_model
method.The Predictor Class can utilize any static or non-static methods and members as long as loading the predictor via
pickle.load
(done by MLflow internally) works as expected.
2. Save the predictor
Once the predictor has been created, it needs to be saved. Predictors can be saved using the mlflow.pyfunc.save_model
method.
For simplicity, add this code block to the script where the predictor is defined, otherwise you'll have to include the predictor definition script in the 'code-path' list:
config = {
'conda_env': 'path-to-conda-yaml', # can also be dict defining the YAML
'code_path': ['path-to-utils'],
'artifacts': {
'request_schema': 'path-to-request-schema',
'data': 'path-to-data', # optional
'model': 'path-to-model'
}
}
mlflow.pyfunc.save_model(path='./model', python_model=Predictor(), **config)
save_model method call
tip
pynavio provides functions which help simplify and automate this step further.
See also
3. Test model serving
After the model has been saved it's important to test the model's serving functionality to ensure that the model works as expected.
The easiest way to do is, is to spin up the model in a temporary docker container locally using the MLflow build-docker
CLI function. Because we use Mlflow for this, the request format below is specific to Mlflow and not navio.
The build-docker
function creates a docker container out of an MLflow-compatible model using a web server
Request looks different depending on mlflow version used. Let us store the request in a json file called request.json
as follows:
For mlflow<2.0:
# request.json contents
[{"x": 1, "y": 2}]
For mlflow>=2.0:
# request.json contents
{
"dataframe_records": [{"x": 1, "y": 2}]
}
These commands should test every part of the saved model (executed from the directory of request.json
file):
mlflow models build-docker -m ./model --install-mlflow -n mlflow-model-image
docker run --name mlflow-model-container -d --rm -p 127.0.0.1:5001:8080 mlflow-model-image
# this should produce text in JSON format as defined in the predictor's predict method
curl \
-d @request.json \
-H 'Content-Type: application/json' \
-X POST localhost:5001/predict
# clean up
docker stop mlflow-model-container
docker rmi mlflow-model-image
4. Edit the metadata
MLflow uses a MLmodel file for configuring a model's metadata. This file is automatically generated during the model saving process.
navio requires additional metadata that needs to be appended to the generated MLmodel file:
metadata:
# optional
# disabled, default or plotly
explanations: disabled
# optional
# disabled or default
oodDetection: disabled
# optional
dataset:
name: name-to-assign-to-data-upon-upload
path: archive-path-to-data-csv
request_schema:
path: archive-path-to-request-schema-json
For tabular and small time series data, you can set
explanations: default
to enable display of SHAP values in the model try-out view.If you are not deploying an image model, you can set
oodDetection: default
to enable navio's OOD detection.OOD detection and SHAP values will not be available if no data is assigned to the model.
By default (i.e. if the respective YAML metadata fields are not provided)
explanations
andoodDetection
are set todisabled
.
If you're using pynavio, you can automate this by passing these metadata fields to the to_navio
function.
:::
5. Archive the model
As mentioned previously, navio requires mlflow models to be uploaded a zip archives. Once the model has been tested and the required metadata has been updated, create a zip file with the contents of the directory created during model saving excluding the parent directory.
Example:
# archive from within the parent directory
cd ./model
# zip all contents, saving one directory level above
zip -r ../model.zip .
This step is automated when using pynavio.mlflow.to_navio
, as it outputs a model zip file.
Ta-da! Your model is now ready to be uploaded to navio.
Accessing The Derived Class
If you want to use the saved MLflow model in your own python code, there is one caveat to keep in mind: Models loaded via mlflow.pyfunc.load_model
, are of type mlflow.pyfunc.PythonModel
and therefore hide any traits of the derived class.
The traits of the derived class can be accessed via:
mlflow_model = mlflow.pyfunc.load_model('./model')
# all traits of the original saved model instance,
# including user defined members / methods, should
# be accessible through this object
model = mlflow_model._model_impl.python_model