*ERROR* ring gfx_0.0.0 timeout + GPU reset during general use (7840HS w/ Radeon 780M)
This has been happening consistently over the past few months since I have acquired this machine. During general use of the machine (no GPU-intensive tasks, no games, no video playback, no explicit intense de/encoding), the screen will go blank and amdgpu will issue a GPU reset. The reset has never been successful and a hard reset of the machine is required. I've been unable to reproduce it reliably, but it seems to happen upon windowing events (i.e. an action that makes a new window primitive or destroys one). It happens about once every 4-8 days. Unlike other's reports that are similar, there have never been any visual artifacts upon the issue. The only thing that happens is the screen goes blank until the GPU reset happens. Then, Wayland crashes and the desktop is forced back to TTY where it never recovers.
As mentioned, this has happened consistently across kernel versions (6.7.*). This has been consistent on both Wayland and X11 desktop environments. I'm currently using gnome desktop on wayland, but it used to happen on XFCE as well when I had it.
- CPU: AMD Ryzen 7 7840HS
- GPU: Radeon 780M (Phoenix1 [1002:15BF])
- System Memory: 32GB
- Type of Display Connection: HDMI
- Distro name and Version: Debian 12
- Kernel version: 6.9.7+bpo-amd64
Log @ the point where the GPU reset occurs:
Jul 28 14:04:01 zinc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=14234682, emitted seq=14234684
Jul 28 14:04:01 zinc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1878 thread gnome-shel:cs0 pid 1896
Jul 28 14:04:01 zinc kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
Jul 28 14:04:01 zinc kernel: amdgpu 0000:c5:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
Jul 28 14:04:02 zinc kernel: amdgpu 0000:c5:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
Jul 28 14:04:02 zinc kernel: amdgpu 0000:c5:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:839
Jul 28 14:04:02 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:02 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:02 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:02 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:02 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:02 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:02 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:02 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:03 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:03 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:03 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:03 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:03 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:03 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:03 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:03 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:03 zinc kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
Jul 28 14:04:03 zinc kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
Jul 28 14:04:03 zinc kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Jul 28 14:04:03 zinc kernel: amdgpu 0000:c5:00.0: amdgpu: MODE2 reset
Jul 28 14:04:03 zinc kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
Full log for boot: journalctl.txt