Mesa 23.3rc5 simetimes gets stuck after GPU reset
- OS: Debian GNU/Linux 12 (bookworm)
- GPU: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] [1002:73df] (rev c5)
- Kernel version: 6.5.11
- Mesa version: (OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.3.0-rc5
- Xserver version (if applicable): X.Org X Server 1.21.1.7
- Desktop manager and compositor: awesome v4.3
Describe the issue
When GPU resets due to [gfxhub] page fault
(see drm/amd#2943 for the issue + more hardware/software details), recent Mesa versions (post 23.3-rc1) sometimes get stuck when Xorg handles GPU restart. I.e. instead of Xorg dying as happened before, it is just stuck, and manual kill
/systemctl restart
is necessary.
The relevant backtrace in Xorg.0.log.old:
[ 3773.146] (EE) Backtrace:
[ 3773.155] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x139) [0x5619469e5d99]
[ 3773.155] (EE) 1: /lib/x86_64-linux-gnu/libc.so.6 (__sigaction+0x40) [0x7f904045afd0]
[ 3773.156] (EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (pthread_key_delete+0x14c) [0x7f90404a9d3c]
[ 3773.156] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x12) [0x7f904045af32]
[ 3773.156] (EE) 4: /lib/x86_64-linux-gnu/libc.so.6 (abort+0xd3) [0x7f9040445472]
[ 3773.159] (EE) 5: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (amdgpu_ctx_set_sw_reset_status+0xaf) [0x7f903ddc79af]
[ 3773.160] (EE) 6: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (amdgpu_cs_submit_ib+0x416) [0x7f903ddcad56]
[ 3773.161] (EE) 7: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (util_queue_thread_func+0x153) [0x7f903d4ef0d3]
[ 3773.161] (EE) 8: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (impl_thrd_routine+0x17) [0x7f903d50e327]
[ 3773.162] (EE) 9: /lib/x86_64-linux-gnu/libc.so.6 (pthread_condattr_setpshared+0x4d4) [0x7f90404a8044]
[ 3773.162] (EE) 10: /lib/x86_64-linux-gnu/libc.so.6 (__xmknodat+0x23c) [0x7f904052861c]
[ 3773.162] (EE)
[ 3773.162] (EE)
Fatal server error:
[ 3773.162] (EE) Caught signal 6 (Aborted). Server aborting
Regression
I've never encountered this issue with 23.3-rc1, but with 23.3-rc3 the issue appeared (I've skipped 23.3-rc2).
Log files as attachment
- Dmesg (the first instance of gfxhub page faults is relevant)dmesg
- Xorg.0.log.old:Xorg.0.log.old, because I can trigger this situation pretty quickly I can get more detailed information from GDB if necessary.
- GPU hang details: see drm/amd#2943