Skip to main content

Model Contents

A valid custom model must be uploaded as a zip file with the following internal structure:

.
├── artifacts
│   ├── model.xyz
│   ├── schema.json
│   └── data.csv
├── code
│   └── ...
├── conda.yaml
├── MLmodel
└── python_model.pkl

The table below describes each of these components in the zip file in more detail.

File or DirectoryRequiredDescription
artifacts/ TrueGenerated by MLflow. It must contain all files required by the load_context method of the model class.
artifacts/model.xyz FalseExample model artifact. This can be anything - a pickle file, h5 weights file, a separate nested MLflow model, etc.

The path to this file should normally be available via context.artifacts['<artifact name>'] within the model's load_context method, assuming the model was saved correctly.
artifacts/schema.json  TrueSpecifies the request schema. This is used by the backend for handling of the model API requests.

The name and location of this file can be custom, as long as the MLmodel file correctly specifies the path to the file.
artifacts/data.csv  FalseThe Data Set file can optionally be added which is used by the backend to provide OOD Detection and Explanation functionality.

The name and location of this file can be custom, as long as the MLmodel file correctly specifies the path to the file.
code/ FalseContains module directories provided as code argument to the model saving call.

This directory will be appended to the PYTHONPATH variable during model serving, i.e. all contents should be visible to the import statement within the model script.
conda.yaml TrueConda is used for managing dependencies. The Conda Env definition is usable via conda env create -f conda.yaml. This file is referenced in the MLmodel file.
MLmodel TrueThe YAML file used by MLflow for specifying model metadata. This file is generated during model saving. navio requires additional metadata which must be added to this file before the model is uploaded.
python_model.pkl TrueThe saved instance of mlflow.pyfunc.PythonModel dumped via pickle.dump during model saving. This is used by MLflow for model serving.

Conda Env

Conda is used to manage the model dependencies. A conda.yaml file is required for specifying the dependencies.

Example:

channels:
- defaults
dependencies:
- python=3.8.5
- pip=20.0.2
- pip:
- mlflow==1.15.0
- numpy==1.20.2
- pandas==1.2.4

# the name is not important
name: venv

The YAML file can be generated from a conda environment <env name> via:

conda env export -n <env name> --no-builds

In many cases, it may be sufficient to simply paste the pip packages necessary for running the model into the dependencies.pip list of the above YAML file.

tip

pynavio has a helper function that infers the external dependencies based on the file path, automating this step further.

caution

A large list of dependencies in the YAML file will affect performance during the model upload and will lead to a large model docker image & container.

Only keep the packages which are absolutely essential for running the model in your conda YAML file. 

MLmodel file

The MLmodel file is a YAML file used to specify model metadata.

Example:

flavors:
python_function:
artifacts:
dataset:
path: artifacts/data.csv
uri: /tmp/tmpc6homp9h/data.csv
schema:
path: artifacts/schema.json
uri: /tmp/tmpga90iy17/schema.json
cloudpickle_version: 1.6.0
code: code
env: conda.yaml
loader_module: mlflow.pyfunc.model
python_model: python_model.pkl
python_version: 3.8.5
metadata:
dataset:
name: minimal-data
path: artifacts/data.csv
explanations: disabled
oodDetection: default
request_schema:
path: artifacts/schema.json
utc_time_created: "2021-05-26 08:15:34.444437"

The metadata field is the only part that needs to be edited by the user as described on this page.

Request Schema

The request schema is a JSON file that specifies the names, types, example data and, nullability for the following:

  • Feature columns - i.e. input columns the model requires to make a prediction

  • Target column - i.e. the label column the model was trained on (necessary only for displaying the model's prediction in the try-out view)

  • Date time column - (for time series models only) the time column the model expects

Example:

{
"featureColumns": [
{
"name": "acc_0",
"sampleData": 0.9180555898766518,
"type": "float",
"nullable": false
},
{
"name": "acc_1",
"sampleData": -0.1124999994242935,
"type": "float",
"nullable": false
},
{
"name": "acc_2",
"sampleData": 0.5097222514293852,
"type": "float",
"nullable": false
}
],
"targetColumns": [
{
"name": "activity",
"sampleData": "UNKNOWN",
"type": "string",
"nullable": false
}
],
"dateTimeColumn": {
"name": "Time",
"sampleData": "2015-07-29 00:00:00.000",
"type": "timestamp",
"nullable": false
}
}

Note that the target column name does not affect the model's prediction response format. The model should always produce a result with a field prediction, regardless of what the target column name is.

tip

pynavio can generate a request schema for navio model from its data.

Dataset

The dataset is the training data that the associated model was trained with. Including this data set it optional. However, without it, the prediction explanations and OOD detection features are not available.

The dataset can be included in the MLflow model archive or assigned to the uploaded MLflow model separately afterwards.

Currently, only CSV data sets are supported with the following requirements (note that white space in the CSV header is NOT ignored):

  • Comma , field/column separator (customizable in the data set upload dialog)

  • New line \n line separator

  • Double quote " escape character

The dataset must be compatible with the model's Request Schema.

  • All columns defined in the schema must be present and convertible to the type specified in the schema.

  • Any columns which are not present in the schema will be ignored.

tip

Automate inclusion of the dataset using pynavio.