nvcodec: NVRTC dependency restricts compatibility along multiple environments with different display drivers

Background: I maintain Docker containers that have dependencies on nvh264enc, cudaupload, cudaconvert, and other GStreamer Bad nvcodec plugins.

https://github.com/selkies-project/docker-nvidia-glx-desktop

https://github.com/selkies-project/docker-nvidia-egl-desktop

In Kubernetes (where my containers get deployed), nodes may have different NVIDIA display driver versions.

The following are classified as NVIDIA driver components:

NVIDIA kernel driver modules (nvidia.ko, nvidia-drm.ko, nvidia_modeset.ko, nvidia_uvm.ko, etc., which mount /dev/nvidia* and /dev/dri devices)
NVIDIA userspace driver component libraries (libnvidia-decode.so, libnvidia-encode.so, libnvcuvid.so, libEGL_nvidia.so, libGLX_nvidia.so, libnvidia-opencl.so, libnvidia-ml.so, libnvidia-compiler.so, etc.)
CUDA userspace driver component libraries, which is technically a subset of [2] (libcuda.so, libnvidia-ptxjitcompiler.so, libnvidia-nvvm.so, libcudadebugger.so)

The above components are installed as part of NVIDIA drivers (nvidia-driver-5xx, NVIDIA-Linux-x86_64-5xx.xxx.xx.run), and are not components of the CUDA Toolkit. Containers depend on the driver versions that the host has installed.

The NVIDIA Container Toolkit (used to be called nvidia-docker or NVIDIA Container Runtime in the past), provisions GPUs and injects (most of) the above NVIDIA driver components to individual containers in Kubernetes (combined with the k8s-device-plugin), Docker, or Podman.

Of course, one can also opt to install all of the userspace driver components by detecting the host driver version from /proc/driver/nvidia/version and automatically downloading the drivers from NVIDIA's servers. Kernel driver modules are provisioned through cgroups, and may only be controlled by the host.

In the GStreamer Bad nvcodec plugins, in addition to the above NVIDIA driver components, there is one more dependency from the CUDA Toolkit (which is completely in userspace), which is NVRTC. NVRTC is known as a runtime compilation library for CUDA C++, and with moderately high confidence, is the only userspace CUDA Toolkit dependency in nvcodec.

The issue here is, since NVRTC is a runtime compilation library, nvcodec does not initialize when the NVRTC version is higher than the host's libcuda.so version. This affects both Linux and Windows.

cudaconverter gstcudaconverter.c:1893:gst_cuda_converter_setup: CUDA call failed: CUDA_ERROR_UNSUPPORTED_PTX_VERSION, the provided PTX was compiled with an unsupported toolchain.

NVIDIA provides two options to solve this problem:

Forward Compatibility: This is a capability provided by NVIDIA to replace libcuda.so, libnvidia-ptxjitcompiler.so, libnvidia-nvvm.so, and libcudadebugger.so inside the container and be able to use higher CUDA versions than what the host driver allows (such as using CUDA 12.x with driver versions < 525). However, this is only allowed for Datacenter (Tesla) and Professional (Quadro) GPUs, and Consumer (GeForce) GPUs are blocked from this approach.

nvcodec plugin.c:118:plugin_init: Failed to init cuda, cuInit ret: 0x324: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW

Minor Version Compatibility: This allows interoperability within the same major CUDA versions (10.x, 11.x or 12.x), except for applications including nvcodec that compile device code to PTX through NVRTC. The option given from NVIDIA in this regard is to compile using the static NVRTC PTX libraries libnvrtc_static.a and libnvrtc-builtins_static.a, instead of linking libnvrtc.so and libnvrtc-bulletins.so.
The immediate intervention that can be done now is to extract and bundle the oldest possible redistributable NVRTC libraries libnvrtc.so and libnvrtc-bulletins.so libraries available to the public, because backward compatibility from an old CUDA Toolkit a newer driver is always ensured. This allows backward compatibility until driver versions 450.xx in aarch64 because CUDA 11 is the first major toolkit release to support aarch64 (I am emphasizing this because of Ampere Altra as well as NVIDIA Jetson), and even older versions are available for x86_64 and ppc64le.

So, in conclusion, I am looking for more ideas to support nvcodec in a portable way across various node environments, or whether using the static NVRTC PTX libraries libnvrtc_static.a and libnvrtc-builtins_static.a would be plausible.

This issue was written after a discussion with @seungha.yang.

Edit: this is my monkey-patch solution for now. I opened the issue because there may be better solutions now or in the future.

https://github.com/selkies-project/docker-nvidia-glx-desktop/issues/44#issuecomment-1804361948

# Extract NVRTC dependency, https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/LICENSE.txt
cd /tmp && curl -fsSL -o nvidia_cuda_nvrtc_linux_x86_64.whl "https://developer.download.nvidia.com/compute/redist/nvidia-cuda-nvrtc/nvidia_cuda_nvrtc-11.0.221-cp36-cp36m-linux_x86_64.whl" && unzip -joq -d ./nvrtc nvidia_cuda_nvrtc_linux_x86_64.whl && cd nvrtc && chmod 755 libnvrtc* && find . -maxdepth 1 -type f -name "*libnvrtc.so.*" -exec sh -c 'ln -snf $(basename {}) libnvrtc.so' \; && mv -f libnvrtc* /opt/gstreamer/lib/x86_64-linux-gnu/ && cd /tmp && rm -rf /tmp/*

Edited Nov 09, 2023 by Seungmin Kim

Admin message

nvcodec: NVRTC dependency restricts compatibility along multiple environments with different display drivers