RADV: regression in 23.2.1 causing GPU hang with RDNA1 in various UE5 games
Description
I've been reproducing GPU timeouts in various UE5 games with an RX 5700 XT and a fresh Mesa build. I've mostly been testing the Talos 2 demo as that's where it was most easily reproduced, but I've also had similar hangs in demos for Robocop, Thaumaturge, IfSunSets and Jusant.
Log files (for system lockups / game freezes / crashes)
robocop_thaumaturge_talos2_demos_umr_dumps.zip
Steps to reproduce
Steam library save location: /steamapps/compatdata/2312690/pfx/drive_c/users/steamuser/Local Settings/Application Data/Talos2Demo/Saved/SaveGames
Start Talos 2 demo and set global illumination to higher than medium for more reliable hangs
Load save game or get to puzzle named "versatile contraption"
Pick up the jammer one or more times until it hangs
System information
System:
Host: ryzen-runar Kernel: 6.5.7-arch1-1 arch: x86_64 bits: 64 compiler: gcc
v: 13.2.1 Desktop: KDE Plasma v: 5.27.8 tk: Qt v: 5.15.11 wm: kwin_x11
dm: SDDM Distro: Arch Linux
CPU:
Info: 12-core model: AMD Ryzen 9 7900 bits: 64 type: MT MCP arch: Zen 4
rev: 2 cache: L1: 768 KiB L2: 12 MiB L3: 64 MiB
Speed (MHz): avg: 858 high: 5129 min/max: 400/5482 cores: 1: 5129 2: 3484
3: 400 4: 400 5: 400 6: 400 7: 400 8: 400 9: 400 10: 400 11: 400 12: 400
13: 3597 14: 400 15: 400 16: 400 17: 400 18: 400 19: 400 20: 400 21: 400
22: 400 23: 400 24: 400 bogomips: 177666
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
driver: amdgpu v: kernel arch: RDNA-1 pcie: speed: 16 GT/s lanes: 16 ports:
active: DP-2,HDMI-A-1 empty: DP-1,DP-3 bus-ID: 03:00.0 chip-ID: 1002:731f
Device-2: Logitech Webcam C310 driver: snd-usb-audio,uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 7-1:2 chip-ID: 046d:081b
Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.2.1
compositor: kwin_x11 driver: X: loaded: modesetting unloaded: vesa
alternate: fbdev dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
Screen-1: 0 s-res: 4480x1440 s-dpi: 96
Monitor-1: DP-2 pos: primary,right model: Dell U2515H res: 2560x1440
dpi: 118 diag: 634mm (25")
Monitor-2: HDMI-A-1 mapped: HDMI-1 pos: left model: Samsung S24E391
res: 1920x1080 dpi: 94 diag: 598mm (23.5")
API: EGL v: 1.5 platforms: device: 0 drv: radeonsi device: 1 drv: swrast
surfaceless: drv: radeonsi x11: drv: radeonsi inactive: gbm,wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 23.2.1-arch1.2
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 5700 XT (navi10 LLVM
16.0.6 DRM 3.54 6.5.7-arch1-1) device-ID: 1002:731f
API: Vulkan v: 1.3.264 surfaces: xcb,xlib device: 0 type: discrete-gpu
driver: mesa radv device-ID: 1002:731f device: 1 type: cpu
driver: mesa llvmpipe device-ID: 10005:0000
Regression
I had a hang today with a fresh build from fb95f1d5. I didn't get any hangs when building from 23.1-branchpoint.
Git bisect pointed to this commit: e15a4e6e
Further information
Doesn't happen with RADV_DEBUG=nodcc