GPU Models
navio supports the use of a GPU for accelerated inference.
A navio model can gain access to a GPU for processing by specfiying the gpus
metadata field in the internal MLmodel file:
flavors:
python_function:
...
metadata:
gpus: 1
...
gpus
has to be an integer greater than or equal to zero. If it's specified as zero then no GPU device access will be granted. GPU access is given by running the model using the nvidia docker runtime. To enable this, the docker run command receives the following additional arguments:
--runtime=nvidia
- activates the nvidia runtime
-e NVIDIA_VISIBLE_DEVICES=all
- gives the model container access to all GPU resources
-e GUNICORN_CMD_ARGS=--workers=1
- limits the number of model instances launched at container start to 1 to prevent excessive GPU resource usage
Depending on the modelling framework used to create the artifacts internal to the MLflow model, additional steps may be necessary to ensure that the available GPU resources can be utilized.
caution
The modelling framework you wish to use must have support for Nvidia CUDA.
There are some things to keep in mind when running GPU models:
The code of the
load_context
andpredict
methods of the MLflow model need to be GPU aware.- e.g. The model artifact and the input tensors are assigned to the GPU device if using pytorch.
The
conda.yml
must specify a compatible version of the CUDA libraries required by the version of the modelling framework that's being used.- e.g.
cudatoolkit=11.3.1
andcudnn=8.2.1
fortensorflow==2.5.1
tip
Check out the Nvidea support matrix to see which version you need.
- e.g.