Model Contents

A valid custom model must be uploaded as a zip file with the following internal structure:

.
├── artifacts
│   ├── model.xyz
│   ├── schema.json
│   └── data.csv
├── code
│   └── ...
├── conda.yaml
├── MLmodel
└── python_model.pkl

The table below describes each of these components in the zip file in more detail.

File or Directory	Required	Description
`artifacts/`	True	Generated by MLflow. It must contain all files required by the `load_context` method of the model class.
`artifacts/model.xyz`	False	Example model artifact. This can be anything - a pickle file, h5 weights file, a separate nested MLflow model, etc. The path to this file should normally be available via `context.artifacts['<artifact name>']` within the model's `load_context` method, assuming the model was saved correctly.
`artifacts/schema.json`	True	Specifies the request schema. This is used by the backend for handling of the model API requests. The name and location of this file can be custom, as long as the MLmodel file correctly specifies the path to the file.
`artifacts/data.csv`	False	The Data Set file can optionally be added which is used by the backend to provide OOD Detection and Explanation functionality. The name and location of this file can be custom, as long as the MLmodel file correctly specifies the path to the file.
`code/`	False	Contains module directories provided as `code` argument to the model saving call. This directory will be appended to the `PYTHONPATH` variable during model serving, i.e. all contents should be visible to the `import` statement within the model script.
`conda.yaml`	True	Conda is used for managing dependencies. The Conda Env definition is usable via `conda env create -f conda.yaml`. This file is referenced in the MLmodel file.
`MLmodel`	True	The YAML file used by MLflow for specifying model metadata. This file is generated during model saving. navio requires additional metadata which must be added to this file before the model is uploaded.
`python_model.pkl`	True	The saved instance of `mlflow.pyfunc.PythonModel` dumped via `pickle.dump` during model saving. This is used by MLflow for model serving.

Conda Env

Conda is used to manage the model dependencies. A conda.yaml file is required for specifying the dependencies.

Example:

channels:
  - defaults
dependencies:
  - python=3.8.5
  - pip=20.0.2
  - pip:
      - mlflow==1.20.1
      - numpy==1.20.2
      - pandas==1.2.4

# the name is not important
name: venv

The YAML file can be generated from a conda environment <env name> via:

conda env export -n <env name> --no-builds

In many cases, it may be sufficient to simply paste the pip packages necessary for running the model into the dependencies.pip list of the above YAML file.

tip

pynavio has a helper function that infers the external dependencies based on the file path, automating this step further.

caution

A large list of dependencies in the YAML file will affect performance during the model upload and will lead to a large model docker image & container.

Only keep the packages which are absolutely essential for running the model in your conda YAML file.

MLmodel file

The MLmodel file is a YAML file used to specify model metadata.

Example:

flavors:
  python_function:
    artifacts:
      dataset:
        path: artifacts/data.csv
        uri: /tmp/tmpc6homp9h/data.csv
      schema:
        path: artifacts/schema.json
        uri: /tmp/tmpga90iy17/schema.json
    cloudpickle_version: 1.6.0
    code: code
    env: conda.yaml
    loader_module: mlflow.pyfunc.model
    python_model: python_model.pkl
    python_version: 3.8.5
metadata:
  dataset:
    name: minimal-data
    path: artifacts/data.csv
  explanations: disabled
  oodDetection: default
  request_schema:
    path: artifacts/schema.json
utc_time_created: "2021-05-26 08:15:34.444437"

The metadata field is the only part that needs to be edited by the user as described on this page.

Request Schema

The request schema is a JSON file that specifies the names, types, example data and, nullability for the following:

Feature columns - i.e. input columns the model requires to make a prediction
Target column - i.e. the label column the model was trained on (necessary only for displaying the model's prediction in the try-out view)
Date time column - (for time series models only) the time column the model expects

Example:

{
  "featureColumns": [
    {
      "name": "acc_0",
      "sampleData": 0.9180555898766518,
      "type": "float",
      "nullable": false
    },
    {
      "name": "acc_1",
      "sampleData": -0.1124999994242935,
      "type": "float",
      "nullable": false
    },
    {
      "name": "acc_2",
      "sampleData": 0.5097222514293852,
      "type": "float",
      "nullable": false
    }
  ],
  "targetColumns": [
    {
      "name": "activity",
      "sampleData": "UNKNOWN",
      "type": "string",
      "nullable": false
    }
  ],
  "dateTimeColumn": {
    "name": "Time",
    "sampleData": "2015-07-29 00:00:00.000",
    "type": "timestamp",
    "nullable": false
  }
}

Note that the target column name does not affect the model's prediction response format. The model should always produce a result with a field prediction, regardless of what the target column name is.

tip

pynavio can generate a request schema for navio model from its data.

Dataset

The dataset is the training data that the associated model was trained with. Including this data set it optional. However, without it, the prediction explanations and OOD detection features are not available.

The dataset can be included in the MLflow model archive or assigned to the uploaded MLflow model separately afterwards.

Currently, only CSV data sets are supported with the following requirements (note that white space in the CSV header is NOT ignored):

Comma , field/column separator (customizable in the data set upload dialog)
New line \n line separator
Double quote " escape character

The dataset must be compatible with the model's Request Schema.

All columns defined in the schema must be present and convertible to the type specified in the schema.
Any columns which are not present in the schema will be ignored.

tip

Automate inclusion of the dataset using pynavio.

Model Contents

Conda Env​

tip

caution

MLmodel file​

Request Schema​

tip

Dataset​

tip

Conda Env

MLmodel file

Request Schema

Dataset