Vulkan apps may crash on exit because driParseConfigFiles() calls atexit() but libvulkan_intel.so may have already been unloaded
System information
% inxi -GSC -xx
System:
Host: itx Kernel: FreeBSD 14.0-RELEASE amd64 bits: 64 compiler: clang
v: 16.0.6 Desktop: GNOME 42.4 tk: GTK 3.24.41 wm: gnome-shell dm: GDM, SDDM
OS: FreeBSD 14.0-RELEASE
CPU:
Info: 32-core model: Intel Core i9-14900K bits: 64 type: MCP arch: N/A
rev: 1 cache: N/A
Speed (MHz): 3187 min/max: N/A cores: No OS support for core speeds.
Features: avx avx2 lm nx pae sse sse2 sse3 ssse3 vmx
Graphics:
Device-1: NVIDIA AD102 [GeForce RTX 4090] driver: vgapci bus-ID: 0:1:0.0
chip-ID: 10de:2684
Display: x11 server: X.Org 1.21.1.13 compositor: gnome-shell driver:
loaded: nvidia resolution: 3840x2160 s-dpi: 96
OpenGL: renderer: NVIDIA GeForce RTX 4090/PCIe/SSE2
v: 4.6.0 NVIDIA 550.54.14 direct render: Yes
%
Describe the issue
Mesa produces libvulkan_intel.so
and libvulkan_radeon.so
, which are dlopen()
ed, dlsym()
ed, and most importantly, dlclose()
ed by the Vulkan loader (libvulkan.so
). However, both libvulkan_intel.so
and libvulkan_radeon.so
call atexit()
. The Vulkan loader is capable of dlclose()
ing libvulkan_intel.so
(e.g. inside vkDestroyInstance()
). This problem is that, when the atexit()
callbacks are run, they attempt to run a function might have been unloaded by dlclose()
. This results in a crash.
Here's the stack trace of the offending call to atexit()
:
* frame #0: 0x0000000827ab2047 libc.so.7`atexit(func=(libvulkan_intel.so`free_program_name at u_process.c:201)) at atexit.c:135:13
frame #1: 0x000000085f0c81a3 libvulkan_intel.so`util_get_process_name_callback at u_process.c:213:7
frame #2: 0x000000082670a4d3 libthr.so.3`_thr_once(once_control=0x000000085f6e5e38, init_routine=(libvulkan_intel.so`util_get_process_name_callback at u_process.c:208)) at thr_once.c:96:2
frame #3: 0x000000085f10d8ad libvulkan_intel.so`call_once(flag=0x000000085f6e5e38, func=(libvulkan_intel.so`util_get_process_name_callback at u_process.c:208)) at threads_posix.c:76:5
frame #4: 0x000000085f0c811a libvulkan_intel.so`util_get_process_name [inlined] util_call_once(flag=0x000000085f6e5e30, func=(libvulkan_intel.so`util_get_process_name_callback at u_process.c:208)) at u_call_once.h:41:7
frame #5: 0x000000085f0c80ee libvulkan_intel.so`util_get_process_name at u_process.c:220:4
frame #6: 0x000000085f62862a libvulkan_intel.so`driParseConfigFiles(cache=0x00003c915cfe6b48, info=0x00003c915cfe6b60, screenNum=0, driverName="anv", kernelDriverName=0x0000000000000000, deviceName=0x0000000000000000, applicationName="MyApplication", applicationVersion=1, engineName="MyEngine", engineVersion=1) at xmlconfig.c:1190:18
frame #7: 0x000000085db02f90 libvulkan_intel.so`anv_init_dri_options(instance=0x00003c915cfe6900) at anv_device.c:2461:4
frame #8: 0x000000085db019eb libvulkan_intel.so`anv_CreateInstance(pCreateInfo=0x0000000820dfda00, pAllocator=0x000000085f6e01e8, pInstance=0x00003c915d598018) at anv_device.c:2539:4
frame #9: 0x00000008247c90de libvulkan.so.1`terminator_CreateInstance(pCreateInfo=0x0000000820dfdc10, pAllocator=0x0000000000000000, pInstance=0x0000000820dfdd50) at loader.c:5403:13
frame #10: 0x00000008247ccb25 libvulkan.so.1`loader_create_instance_chain(pCreateInfo=0x0000000820dfdd10, pAllocator=0x0000000000000000, inst=0x00003c915d626800, created_instance=0x0000000820dfdd50) at loader.c:4676:15
frame #11: 0x00000008247d66e0 libvulkan.so.1`vkCreateInstance(pCreateInfo=0x0000000820e016a8, pAllocator=0x0000000000000000, pInstance=0x0000000820e018c8) at trampoline.c:733:11
The man page for atexit()
doesn't say anything about running the registered callbacks when the shared library that called atexit()
gets unloaded. Likewise, The man page for dlclose()
doesn't say anything about running atexit()
callbacks either. Therefore, it's wrong for any library that gets dlopen()
ed/dlsym()
ed/dlclose()
ed to call atexit()
. libvulkan_intel.so
does so. This call to atexit()
exists in tip-of-tree sources.
One solution might be to remove the call to atexit()
and change this:
static void
free_program_name(void)
{
free(program_name);
program_name = NULL;
}
to this:
__attribute__((destructor)) static void
free_program_name(void)
{
free(program_name);
program_name = NULL;
}
An alternative (potentially more invasive) solution might be to add an explicit cleanup step at the right time/place which will explicitly call free_program_name()
. The right place for that might be in driParseConfigFiles()
or driDestroyOptionInfo()
or driDestroyOptionCache()
or some place like that.
Log files as attachment
- Output of
dmesg
- Backtrace above
Workaround
One workaround seems to be (though I haven't precisely determined) for the application to simply leak the VkInstance - this seems to cause the Vulkan loader to avoid unloading libvulkan_intel.so
.
Another workaround seems to be to be to set VK_LOADER_DISABLE_DYNAMIC_LIBRARY_UNLOADING=1
, which causes the Vulkan Loader to avoid unloading it too.