RADV deadlock: spec-violating infinite sync wait in vkQueuePresentKHR
Description
I'm not 100% sure what the triggering circumstances are, but the basic idea is something like this:
- Create VkFence 0xA, timeline VkSemaphore 0xB = 0 and binary VkSemaphore 0xC
- Submit compute command that signals VkFence 0xA
- Submit graphics command that waits for VkSemaphore 0xB == 1 and signals 0xC
- Submit
vkQueuePresentKHR
that waits for 0xC .... - (on CPU) vkWaitForFences(0xA) -> vkSignalSemaphore(0xB = 1)
Steps 1-4 may be in a loop (with different objects), and step 5 is done at some arbitrary future point in time. It doesn't always deadlock, but usually deadlocks within a few frames of me starting the application.
Log files (for system lockups / game freezes / crashes)
submitting compute command (step 2)
[New Thread 0x7fffac8786c0 (LWP 126713)]
Spent 14.776 ms translating SPIR-V
submitting graphics command (step 3)
signalling semaphore (step 5)
Dropped frame with PTS 2.252
submitting compute command (step 2)
submitting graphics command (step 3)
[New Thread 0x7fff250896c0 (LWP 126714)]
^C
Thread 1 "plplay" received signal SIGINT, Interrupt.
__GI___ioctl (fd=fd@entry=11, request=request@entry=3223872714) at ../sysdeps/unix/sysv/linux/ioctl.c:36
36 if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (r)))
Missing separate debuginfos, use: zypper install Mesa-libva-debuginfo-22.3.5-344.3.x86_64 libLLVM15-debuginfo-15.0.7-1.1.x86_64 libSPIRV-Tools-2023_1-debuginfo-2023.1-1.1.x86_64 libbrotlienc1-debuginfo-1.0.9-1.10.x86_64 libglslang12-debuginfo-12.0.0-1.1.x86_64 libhwy1-debuginfo-1.0.3-1.1.x86_64 libjxl0_8-debuginfo-0.8.1-2.1.x86_64 libopenssl3-debuginfo-3.0.7-3.1.x86_64 librist4-debuginfo-0.2.7-1.6.x86_64 libsrt1_5-debuginfo-1.5.1-1.2.x86_64 libunistring5-debuginfo-1.1-1.1.x86_64 libunwind8-debuginfo-1.6.2-4.3.x86_64 libx264-164-debuginfo-0.164+git20220602.baee400f-1.1.x86_64
(gdb) bt
#0 __GI___ioctl (fd=fd@entry=11, request=request@entry=3223872714) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1 0x00007ffff19917f8 in drmIoctl (fd=11, request=request@entry=3223872714, arg=arg@entry=0x7fffffffc550) at ../xf86drm.c:704
#2 0x00007ffff1995d38 in drmSyncobjTimelineWait (fd=<optimized out>, handles=<optimized out>, points=<optimized out>, num_handles=<optimized out>, timeout_nsec=<optimized out>, flags=<optimized out>, first_signaled=0x0) at ../xf86drm.c:5059
#3 0x00007fffe773e175 in vk_drm_syncobj_wait_many (device=0x976030, wait_count=<optimized out>, waits=<optimized out>, wait_flags=<optimized out>, abs_timeout_ns=9223372036854775807) at ../src/vulkan/runtime/vk_drm_syncobj.c:269
#4 0x00007fffe773a3ae in __vk_sync_wait (device=<optimized out>, sync=<optimized out>, wait_value=<optimized out>, wait_flags=<optimized out>, abs_timeout_ns=<optimized out>) at ../src/vulkan/runtime/vk_sync.c:234
#5 0x00007fffe7741083 in vk_queue_submit (queue=0x62aa30, info=<optimized out>) at ../src/vulkan/runtime/vk_queue.c:948
#6 0x00007fffe77414ee in vk_common_QueueSubmit2KHR (_queue=0x62aa30, submitCount=1, pSubmits=<optimized out>, _fence=0x107aae0) at ../src/vulkan/runtime/vk_queue.c:1156
#7 0x00007fffe773b2ec in vk_common_QueueSubmit (_queue=<optimized out>, submitCount=<optimized out>, pSubmits=<optimized out>, fence=0x107aae0) at ../src/vulkan/runtime/vk_synchronization2.c:416
#8 0x00007fffe771df74 in wsi_common_queue_present (wsi=0x86dff8, device=0x976030, queue=0x62aa30, queue_family_index=0, pPresentInfo=0x7fffffffd2d0) at ../src/vulkan/wsi/wsi_common.c:1220
#9 0x00007ffff7f764d6 in vk_sw_submit_frame (sw=0xaab7e0) at ../src/vulkan/swapchain.c:815
#10 0x000000000040687b in render_loop (p=0x461980 <state>) at ../demos/plplay.c:531
#11 main (argc=<optimized out>, argv=<optimized out>) at ../demos/plplay.c:706
(gdb)
The deadlock here happens inside a vk_sync_wait(..., UINT64_MAX)
call inside vk_queue_submit
, which seems to be stuck forever waiting for the previous shader execution (step 3) to complete, which itself is stuck forever waiting for the CPU update (step 5), which can't happen because vkQueueSubmitKHR
is stuck.
Calls to
vkQueuePresentKHR
may block, but must return in finite time.
Steps to reproduce
- Check out https://code.videolan.org/videolan/libplacebo/-/commits/peak_detection_v2
- Build with
-Ddemos=true
- Run the built
plplay
binary on a HDR file - Enable "Peak detection" (in the in-program options GUI)
It should deadlock at most a handful of frames after this.
System information
System:
Host: xor Kernel: 6.1.11-1-preempt arch: x86_64 bits: 64 compiler: gcc
v: 12.2.1 Console: pty pts/10 wm: xmonad DM: SDDM Distro: openSUSE
Tumbleweed 20230214
CPU:
Info: 16-core model: AMD Ryzen Threadripper 1950X bits: 64 type: MT MCP MCM
arch: Zen rev: 1 cache: L1: 1.5 MiB L2: 8 MiB L3: 32 MiB
Speed (MHz): avg: 2612 high: 3400 min/max: 2200/3400 boost: enabled cores:
1: 2200 2: 2200 3: 2200 4: 2200 5: 3400 6: 2200 7: 2200 8: 3400 9: 3400
10: 3400 11: 2200 12: 2200 13: 3400 14: 2200 15: 2200 16: 2200 17: 3400
18: 3400 19: 3400 20: 2200 21: 2200 22: 2200 23: 2200 24: 2200 25: 2200
26: 2200 27: 3400 28: 2200 29: 2200 30: 2200 31: 3400 32: 3400
bogomips: 217198
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
vendor: Tul / PowerColor AXRX driver: amdgpu v: kernel arch: RDNA-1 pcie:
speed: 16 GT/s lanes: 16 ports: active: DP-1 empty: DP-2,DP-3,HDMI-A-1
bus-ID: 0a:00.0 chip-ID: 1002:731f
Device-2: Logitech StreamCam type: USB
driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus-ID: 4-4:3
chip-ID: 046d:0893
Display: x11 server: X.Org v: 21.1.7 with: Xwayland v: 22.1.8 compositor:
driver: X: loaded: amdgpu unloaded: fbdev,modesetting,radeon,vesa
dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
Screen-1: 0 s-res: 3840x2160 s-dpi: 96
Monitor-1: DP-1 mapped: DisplayPort-0 model: Dell UP2718Q res: 3840x2160
dpi: 163 diag: 685mm (27")
API: OpenGL v: 4.6 Mesa 22.3.5 renderer: AMD Radeon RX 5700 XT (navi10
LLVM 15.0.7 DRM 3.49 6.1.11-1-preempt) direct render: Yes