Regression between 23.2.1 and 23.3.3: X.org crashes after abort in amdgpu_winsys_create
System information
- OS: Debian GNU/Linux trixie/sid
- GPU: 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8)
- Kernel version: Linux Bruvac 6.6.9-amd64 #1 (closed) SMP PREEMPT_DYNAMIC Debian 6.6.9-1 (2024-01-01) x86_64 GNU/Linux
- Mesa version: OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.3.3-3
- Xserver version:
X.Org X Server 1.21.1.10
X Protocol Version 11, Revision 0
Current Operating System: Linux Bruvac 6.6.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.9-1 (2024-01-01) x86_64
Kernel command line: lockdown=confidentiality amdgpu.ppfeaturemask=0xffffffff mem_sleep_default=s2idle
xorg-server 2:21.1.10-1
Current version of pixman: 0.42.2
- Desktop manager and compositor: KDE Plasma 5.27.10
KWin version: 5.27.10
Qt Version: 5.15.10
Qt compile version: 5.15.10
XCB compile version: 1.15
Operation Mode: X11 only
Describe the issue
System sometimes becomes unresponsive after 10-30 minutes of light usage, requiring the power button to be held for shutdown. Sometimes the mouse cursor can still be moved for a few seconds before also becoming unresponsive.
Regression
The problem seems to have started since mesa 23.3.3 migrated to Debian testing on 1/23/24. The previous version was 23.2.1
Log files
Backtrace from Xorg.0.log
[ 43.779] (EE) Backtrace:
[ 43.789] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x14d) [0x55a240510fbd]
[ 43.789] (EE) 1: /lib/x86_64-linux-gnu/libc.so.6 (__sigaction+0x40) [0x7fd745c5a510]
[ 43.789] (EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (pthread_key_delete+0x14c) [0x7fd745ca80fc]
[ 43.790] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x12) [0x7fd745c5a472]
[ 43.790] (EE) 4: /lib/x86_64-linux-gnu/libc.so.6 (abort+0xd3) [0x7fd745c444b2]
[ 43.792] (EE) 5: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (amdgpu_winsys_create+0x4bc4) [0x7fd743b15bc4]
[ 43.792] (EE) 6: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (amdgpu_winsys_create+0x7d41) [0x7fd743b18d41]
[ 43.792] (EE) 7: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (__driDriverGetExtensions_d3d12+0x47de9) [0x7fd7432f9769]
[ 43.793] (EE) 8: /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so (__driDriverGetExtensions_d3d12+0x67e7b) [0x7fd7433197fb]
[ 43.793] (EE) 9: /lib/x86_64-linux-gnu/libc.so.6 (pthread_condattr_setpshared+0x4cc) [0x7fd745ca63ec]
[ 43.793] (EE) 10: /lib/x86_64-linux-gnu/libc.so.6 (__clone+0x11c) [0x7fd745d26a5c]
First bit of dmesg errors
2024-01-25T20:00:16.561839-05:00 Bruvac kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=3632, emitted seq=3634
2024-01-25T20:00:16.561846-05:00 Bruvac kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1864 thread Xorg:cs0 pid 1884
2024-01-25T20:00:16.561847-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
2024-01-25T20:00:16.918779-05:00 Bruvac systemd[1]: systemd-fsckd.service: Deactivated successfully.
2024-01-25T20:00:17.574499-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: IP block:gfx_v11_0 is hung!
2024-01-25T20:00:17.574511-05:00 Bruvac kernel: gmc_v11_0_process_interrupt: 21 callbacks suppressed
2024-01-25T20:00:17.574512-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:0 pasid:0, for process pid 0 thread pid 0)
2024-01-25T20:00:17.574513-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
2024-01-25T20:00:17.574513-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B5B
2024-01-25T20:00:17.574514-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
2024-01-25T20:00:17.574514-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
2024-01-25T20:00:17.574514-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x5
2024-01-25T20:00:17.574515-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
2024-01-25T20:00:17.574515-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
2024-01-25T20:00:17.574515-05:00 Bruvac kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x1
Any extra information would be greatly appreciated
Hangs seemed to most frequently occur after closing the lyx latex ide, but have also occured when checking for updates in KDE discover and while trying to report this bug.