[navi21] Sleep and resume breaks switching to tty and causes a GPU hang by plasmashell in KDE
I encountered a session breaking amdgpu hang when switching to tty and back after resume:
- Put computer to sleep (running KDE Wayland session).
- Resume (session still works OK so far).
- Switch to tty2 - it's not working properly (same for tty3, etc.).
- Switch back to tty1 which is supposed to be Wayland session - it's not working either (none of the virtual consoles are).
Remote ssh log-in shows this in dmesg:
[ 110.419058] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 110.684801] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=6145, emitted seq=6147
[ 110.684987] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1599 thread plasmashel:cs0 pid 1723
[ 110.685145] amdgpu 0000:0f:00.0: amdgpu: GPU reset begin!
[ 111.120500] amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 111.120575] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[ 111.400258] amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 111.400320] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[ 111.680112] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 111.710758] [drm] free PSP TMR buffer
[ 111.758498] CPU: 10 PID: 2783 Comm: kworker/u64:80 Not tainted 5.18.15 #1
[ 111.758499] Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P4.80 02/16/2022
[ 111.758500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ 111.758504] Call Trace:
[ 111.758506] <TASK>
[ 111.758508] dump_stack_lvl+0x45/0x5e
[ 111.758511] amdgpu_do_asic_reset+0x28/0x44e [amdgpu]
[ 111.758605] amdgpu_device_gpu_recover_imp.cold+0x613/0x8e7 [amdgpu]
[ 111.758687] amdgpu_job_timedout+0x153/0x190 [amdgpu]
[ 111.758766] drm_sched_job_timedout+0x76/0x110 [gpu_sched]
[ 111.758769] process_one_work+0x1e5/0x3b0
[ 111.758771] worker_thread+0x50/0x3a0
[ 111.758772] ? rescuer_thread+0x390/0x390
[ 111.758773] kthread+0xe8/0x110
[ 111.758774] ? kthread_complete_and_exit+0x20/0x20
[ 111.758776] ret_from_fork+0x22/0x30
[ 111.758778] </TASK>
[ 111.758779] amdgpu 0000:0f:00.0: amdgpu: MODE1 reset
[ 111.758781] amdgpu 0000:0f:00.0: amdgpu: GPU mode1 reset
[ 111.758856] amdgpu 0000:0f:00.0: amdgpu: GPU smu mode1 reset
[ 112.268204] amdgpu 0000:0f:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 112.268539] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
[ 112.268567] [drm] VRAM is lost due to GPU reset!
...
Configuration:
- GPU: Sapphire Pulse RX 6800 XT
- Motherboard: Asrock X570 Taichi (UEFI 4.80)
- Linux/amdgpu: 5.18.15
- Mesa: 22.2.0-rc1
- amdgpu firmware: latest (release 22.20).
- KDE Plasma / kwin: 5.25.4
Edited by Shmerl