nvcodec: NVRTC dependency restricts compatibility along multiple environments with different display drivers
Background: I maintain Docker containers that have dependencies on nvh264enc
, cudaupload
, cudaconvert
, and other GStreamer Bad nvcodec
plugins.
https://github.com/selkies-project/docker-nvidia-glx-desktop
https://github.com/selkies-project/docker-nvidia-egl-desktop
In Kubernetes (where my containers get deployed), nodes may have different NVIDIA display driver versions.
The following are classified as NVIDIA driver components:
- NVIDIA kernel driver modules (
nvidia.ko
,nvidia-drm.ko
,nvidia_modeset.ko
,nvidia_uvm.ko
, etc., which mount/dev/nvidia*
and/dev/dri
devices) - NVIDIA userspace driver component libraries (
libnvidia-decode.so
,libnvidia-encode.so
,libnvcuvid.so
,libEGL_nvidia.so
,libGLX_nvidia.so
,libnvidia-opencl.so
,libnvidia-ml.so
,libnvidia-compiler.so
, etc.) - CUDA userspace driver component libraries, which is technically a subset of [2] (
libcuda.so
,libnvidia-ptxjitcompiler.so
,libnvidia-nvvm.so
,libcudadebugger.so
)
The above components are installed as part of NVIDIA drivers (nvidia-driver-5xx
, NVIDIA-Linux-x86_64-5xx.xxx.xx.run
), and are not components of the CUDA Toolkit. Containers depend on the driver versions that the host has installed.
The NVIDIA Container Toolkit (used to be called nvidia-docker
or NVIDIA Container Runtime in the past), provisions GPUs and injects (most of) the above NVIDIA driver components to individual containers in Kubernetes (combined with the k8s-device-plugin), Docker, or Podman.
Of course, one can also opt to install all of the userspace driver components by detecting the host driver version from /proc/driver/nvidia/version
and automatically downloading the drivers from NVIDIA's servers. Kernel driver modules are provisioned through cgroups
, and may only be controlled by the host.
In the GStreamer Bad nvcodec
plugins, in addition to the above NVIDIA driver components, there is one more dependency from the CUDA Toolkit (which is completely in userspace), which is NVRTC
. NVRTC
is known as a runtime compilation library for CUDA C++, and with moderately high confidence, is the only userspace CUDA Toolkit dependency in nvcodec
.
The issue here is, since NVRTC
is a runtime compilation library, nvcodec
does not initialize when the NVRTC
version is higher than the host's libcuda.so
version. This affects both Linux and Windows.
cudaconverter gstcudaconverter.c:1893:gst_cuda_converter_setup: CUDA call failed: CUDA_ERROR_UNSUPPORTED_PTX_VERSION, the provided PTX was compiled with an unsupported toolchain.
NVIDIA provides two options to solve this problem:
- Forward Compatibility: This is a capability provided by NVIDIA to replace
libcuda.so
,libnvidia-ptxjitcompiler.so
,libnvidia-nvvm.so
, andlibcudadebugger.so
inside the container and be able to use higher CUDA versions than what the host driver allows (such as using CUDA 12.x with driver versions < 525). However, this is only allowed for Datacenter (Tesla) and Professional (Quadro) GPUs, and Consumer (GeForce) GPUs are blocked from this approach.
nvcodec plugin.c:118:plugin_init: Failed to init cuda, cuInit ret: 0x324: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW
-
Minor Version Compatibility: This allows interoperability within the same major CUDA versions (10.x, 11.x or 12.x), except for applications including
nvcodec
that compile device code to PTX throughNVRTC
. The option given from NVIDIA in this regard is to compile using the staticNVRTC
PTX librarieslibnvrtc_static.a
andlibnvrtc-builtins_static.a
, instead of linkinglibnvrtc.so
andlibnvrtc-bulletins.so
. -
The immediate intervention that can be done now is to extract and bundle the oldest possible redistributable
NVRTC
librarieslibnvrtc.so
andlibnvrtc-bulletins.so
libraries available to the public, because backward compatibility from an old CUDA Toolkit a newer driver is always ensured. This allows backward compatibility until driver versions450.xx
inaarch64
because CUDA 11 is the first major toolkit release to supportaarch64
(I am emphasizing this because of Ampere Altra as well as NVIDIA Jetson), and even older versions are available forx86_64
andppc64le
.
So, in conclusion, I am looking for more ideas to support nvcodec
in a portable way across various node environments, or whether using the static NVRTC
PTX libraries libnvrtc_static.a
and libnvrtc-builtins_static.a
would be plausible.
This issue was written after a discussion with @seungha.yang.
Edit: this is my monkey-patch solution for now. I opened the issue because there may be better solutions now or in the future.
https://github.com/selkies-project/docker-nvidia-glx-desktop/issues/44#issuecomment-1804361948
# Extract NVRTC dependency, https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/LICENSE.txt
cd /tmp && curl -fsSL -o nvidia_cuda_nvrtc_linux_x86_64.whl "https://developer.download.nvidia.com/compute/redist/nvidia-cuda-nvrtc/nvidia_cuda_nvrtc-11.0.221-cp36-cp36m-linux_x86_64.whl" && unzip -joq -d ./nvrtc nvidia_cuda_nvrtc_linux_x86_64.whl && cd nvrtc && chmod 755 libnvrtc* && find . -maxdepth 1 -type f -name "*libnvrtc.so.*" -exec sh -c 'ln -snf $(basename {}) libnvrtc.so' \; && mv -f libnvrtc* /opt/gstreamer/lib/x86_64-linux-gnu/ && cd /tmp && rm -rf /tmp/*