[navi21] GPU random hang with nothing really happening
Brief summary of the problem:
I was on my session, almost just logged in. I opened Firefox to check my mail then I closed it, and I had Telegram Desktop in the background polling for messages. Other than that, the system was pretty idle when this lockup occurred. The problem, of course, is the random lockup.
Hardware description:
- CPU: AMD Ryzen 7 3800XT
- GPU:
*-display
description: VGA compatible controller
product: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73BF]
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
physical id: 0
bus info: pci@0000:10:00.0
version: c0
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: iomemory:780-77f iomemory:7c0-7bf irq:140 memory:7800000000-7bffffffff memory:7c00000000-7c0fffffff ioport:f000(size=256) memory:fcc00000-fccfffff memory:fcd00000-fcd1ffff
- System Memory: 32GB
- Display(s): GIGABYTE G34WQC - 3440x1440, 144Hz
- Type of Display Connection: DP
System information:
- Distro name and Version: Artix Linux
- Kernel version: Linux naomi-pc 5.15.64-1-lts #1 (closed) SMP Wed, 31 Aug 2022 21:14:32 +0000 x86_64 GNU/Linux
How to reproduce the issue:
- Be on Linux LTS on Plasma Wayland and do absolutely nothing until the GPU eventually locks up
Attached files:
Log files (for system lockups / game freezes / crashes)
- Dmesg log (full log)
Sep 16 10:42:44 naomi-pc kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Sep 16 10:42:44 naomi-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=135354, emitted seq=135356
Sep 16 10:42:44 naomi-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_wayland pid 3238 thread kwin_wayla:cs0 pid 3280
Sep 16 10:42:44 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset begin!
Sep 16 10:42:44 naomi-pc kernel: amdgpu 0000:10:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Sep 16 10:42:44 naomi-pc kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Sep 16 10:42:45 naomi-pc kernel: amdgpu 0000:10:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Sep 16 10:42:45 naomi-pc kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Sep 16 10:42:45 naomi-pc kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Sep 16 10:42:45 naomi-pc kernel: [drm] free PSP TMR buffer
Sep 16 10:42:45 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: MODE1 reset
Sep 16 10:42:45 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: GPU mode1 reset
Sep 16 10:42:45 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: GPU smu mode1 reset
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 16 10:42:46 naomi-pc kernel: [drm] PCIE GART of 512M enabled (table at 0x00000080012FC000).
Sep 16 10:42:46 naomi-pc kernel: [drm] VRAM is lost due to GPU reset!
Sep 16 10:42:46 naomi-pc kernel: [drm] PSP is resuming...
Sep 16 10:42:46 naomi-pc kernel: [drm] reserve 0xa00000 from 0x83fe000000 for PSP TMR
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resuming...
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw version = 0x003a5400 (58.84.0)
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: SMU driver if version not matched
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resumed successfully!
Sep 16 10:42:46 naomi-pc kernel: [drm] DMUB hardware initialized: version=0x02020013
Sep 16 10:42:46 naomi-pc kernel: [drm] kiq ring mec 2 pipe 1 q 0
Sep 16 10:42:46 naomi-pc kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Sep 16 10:42:46 naomi-pc kernel: [drm] JPEG decode initialized successfully.
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: recover vram bo from shadow start
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: recover vram bo from shadow done
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: amdgpu 0000:10:00.0: amdgpu: GPU reset(2) succeeded!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm] Skip scheduling IBs!
Sep 16 10:42:46 naomi-pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Sep 16 10:42:46 naomi-pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Sep 16 10:42:46 naomi-pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!