Model Contents
A valid custom model must be uploaded as a zip file with the following internal structure:
.
├── artifacts
│ ├── model.xyz
│ ├── schema.json
│ └── data.csv
├── code
│ └── ...
├── conda.yaml
├── MLmodel
└── python_model.pkl
The table below describes each of these components in the zip file in more detail.
File or Directory | Required | Description |
---|---|---|
artifacts/ | True | Generated by MLflow. It must contain all files required by the load_context method of the model class. |
artifacts/model.xyz | False | Example model artifact. This can be anything - a pickle file, h5 weights file, a separate nested MLflow model, etc. The path to this file should normally be available via context.artifacts['<artifact name>'] within the model's load_context method, assuming the model was saved correctly. |
artifacts/schema.json | True | Specifies the request schema. This is used by the backend for handling of the model API requests. The name and location of this file can be custom, as long as the MLmodel file correctly specifies the path to the file. |
artifacts/data.csv | False | The Data Set file can optionally be added which is used by the backend to provide OOD Detection and Explanation functionality. The name and location of this file can be custom, as long as the MLmodel file correctly specifies the path to the file. |
code/ | False | Contains module directories provided as code argument to the model saving call. This directory will be appended to the PYTHONPATH variable during model serving, i.e. all contents should be visible to the import statement within the model script. |
conda.yaml | True | Conda is used for managing dependencies. The Conda Env definition is usable via conda env create -f conda.yaml . This file is referenced in the MLmodel file. |
MLmodel | True | The YAML file used by MLflow for specifying model metadata. This file is generated during model saving. navio requires additional metadata which must be added to this file before the model is uploaded. |
python_model.pkl | True | The saved instance of mlflow.pyfunc.PythonModel dumped via pickle.dump during model saving. This is used by MLflow for model serving. |
Conda Env
Conda is used to manage the model dependencies. A conda.yaml
file is required for specifying the dependencies.
Example:
channels:
- defaults
dependencies:
- python=3.8.5
- pip=20.0.2
- pip:
- mlflow==1.20.1
- numpy==1.20.2
- pandas==1.2.4
# the name is not important
name: venv
The YAML file can be generated from a conda environment <env name>
via:
conda env export -n <env name> --no-builds
In many cases, it may be sufficient to simply paste the pip packages necessary for running the model into the dependencies.pip
list of the above YAML file.
tip
pynavio has a helper function that infers the external dependencies based on the file path, automating this step further.
caution
A large list of dependencies in the YAML file will affect performance during the model upload and will lead to a large model docker image & container.
Only keep the packages which are absolutely essential for running the model in your conda YAML file.
MLmodel file
The MLmodel file is a YAML file used to specify model metadata.
Example:
flavors:
python_function:
artifacts:
dataset:
path: artifacts/data.csv
uri: /tmp/tmpc6homp9h/data.csv
schema:
path: artifacts/schema.json
uri: /tmp/tmpga90iy17/schema.json
cloudpickle_version: 1.6.0
code: code
env: conda.yaml
loader_module: mlflow.pyfunc.model
python_model: python_model.pkl
python_version: 3.8.5
metadata:
dataset:
name: minimal-data
path: artifacts/data.csv
explanations: disabled
oodDetection: default
request_schema:
path: artifacts/schema.json
utc_time_created: "2021-05-26 08:15:34.444437"
The metadata field is the only part that needs to be edited by the user as described on this page.
Request Schema
The request schema is a JSON file that specifies the names, types, example data and, nullability for the following:
Feature columns - i.e. input columns the model requires to make a prediction
Target column - i.e. the label column the model was trained on (necessary only for displaying the model's prediction in the try-out view)
Date time column - (for time series models only) the time column the model expects
Example:
{
"featureColumns": [
{
"name": "acc_0",
"sampleData": 0.9180555898766518,
"type": "float",
"nullable": false
},
{
"name": "acc_1",
"sampleData": -0.1124999994242935,
"type": "float",
"nullable": false
},
{
"name": "acc_2",
"sampleData": 0.5097222514293852,
"type": "float",
"nullable": false
}
],
"targetColumns": [
{
"name": "activity",
"sampleData": "UNKNOWN",
"type": "string",
"nullable": false
}
],
"dateTimeColumn": {
"name": "Time",
"sampleData": "2015-07-29 00:00:00.000",
"type": "timestamp",
"nullable": false
}
}
Note that the target column name does not affect the model's prediction response format. The model should always produce a result with a field prediction
, regardless of what the target column name is.
tip
pynavio can generate a request schema for navio model from its data.
Dataset
The dataset is the training data that the associated model was trained with. Including this data set it optional. However, without it, the prediction explanations and OOD detection features are not available.
The dataset can be included in the MLflow model archive or assigned to the uploaded MLflow model separately afterwards.
Currently, only CSV data sets are supported with the following requirements (note that white space in the CSV header is NOT ignored):
Comma
,
field/column separator (customizable in the data set upload dialog)New line
\n
line separatorDouble quote
"
escape character
The dataset must be compatible with the model's Request Schema.
All columns defined in the schema must be present and convertible to the type specified in the schema.
Any columns which are not present in the schema will be ignored.
tip
Automate inclusion of the dataset using pynavio.