GPU Models

navio supports the use of a GPU for accelerated inference. A navio model can gain access to a GPU for processing by specfiying the gpus metadata field in the internal MLmodel file:

flavors:
  python_function:
    ...
metadata:
  gpus: 1
  ...

gpus has to be an integer greater than or equal to zero. If it's specified as zero then no GPU device access will be granted. GPU access is given by running the model using the nvidia docker runtime. To enable this, the docker run command receives the following additional arguments:

--runtime=nvidia - activates the nvidia runtime

-e NVIDIA_VISIBLE_DEVICES=all - gives the model container access to all GPU resources

-e GUNICORN_CMD_ARGS=--workers=1 - limits the number of model instances launched at container start to 1 to prevent excessive GPU resource usage

Depending on the modelling framework used to create the artifacts internal to the MLflow model, additional steps may be necessary to ensure that the available GPU resources can be utilized.

caution

The modelling framework you wish to use must have support for Nvidia CUDA.

There are some things to keep in mind when running GPU models:

The code of the load_context and predict methods of the MLflow model need to be GPU aware.
- e.g. The model artifact and the input tensors are assigned to the GPU device if using pytorch.
The conda.yml must specify a compatible version of the CUDA libraries required by the version of the modelling framework that's being used.
- e.g. cudatoolkit=11.3.1 and cudnn=8.2.1 for tensorflow==2.5.1
  tip
  Check out the Nvidea support matrix to see which version you need.