amdgpu: *ERROR* ring kiq_2.1.0 test failed (-110)
I am having this issue from time to time, when playing GPU intensive games, it never happens for some games (like CSGO) but happens pretty fast for others (teardown, which is GPU intensive) and very seldom for some like Cyberpunk.
Versions
I am running the release version of the 5.13 kernel (Linux beep 5.13.0-11-generic #11-Ubuntu SMP Tue Jun 29 06:57:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux)
and the latest NAVI10 Firmware as of today (5 hours old)
I am using Mesa Nightly as of 29th of June 2021
It´s a 5700 XT GPU combined with a x570 based Mainboard, so we are running PCIe 4.0.. I am running the latest AGESA ComboAM4v2 1.2.0.2 available for x570
Possible problem
As i see "kiq_2.1.0 test failed" and i saw that 5.13 introduces some changes regarding kiq (introduced a spinlock when accessing the ring).. I suspect that might be a bug related to that change?
Maybe i am looking in the complete wrong direction here and this has something todo with a hardware hickup in the GPU / or loosing transfers on the PCIe 4.0? The card has a stable / good power supply (Seasonic 850W Titanium) and these errors happen even pretty early on in the game, so even with low GPU temperatures.. So i don´t really believe that this is a hardware glitch.
After the issue happens i typically see one more pretty delayed frame of the game and then it glitches out with colorful artifacts and i have to switch to another terminal to restart X / gdm. However after this, the system fully recovers without a reboot.
Here is the relevant dmesg log:
[ 26.768134] retire_capture_urb: 3 callbacks suppressed
[ 460.405529] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
[ 465.525341] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
[ 465.535342] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=40007, emitted seq=40009
[ 465.535556] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process teardown.exe pid 18023 thread teardown.e:cs0 pid 18027
[ 465.535757] amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
[ 465.872128] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 465.872220] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[ 466.098207] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 466.098297] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[ 466.324301] [drm:gfx_v10_0_cp_gfx_enable.isra.0 [amdgpu]] *ERROR* failed to halt cp gfx
[ 466.335212] [drm] free PSP TMR buffer
[ 466.374663] amdgpu 0000:0b:00.0: amdgpu: BACO reset
[ 468.496957] amdgpu 0000:0b:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 468.497069] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
[ 468.497091] [drm] VRAM is lost due to GPU reset!
[ 468.497435] [drm] PSP is resuming...
[ 468.665571] [drm] reserve 0x900000 from 0x81fe400000 for PSP TMR
[ 468.705166] amdgpu 0000:0b:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 468.709328] amdgpu 0000:0b:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 468.709330] amdgpu 0000:0b:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 468.709331] amdgpu 0000:0b:00.0: amdgpu: SMU is resuming...
[ 468.711662] amdgpu 0000:0b:00.0: amdgpu: SMU is resumed successfully!
[ 468.938601] [drm] kiq ring mec 2 pipe 1 q 0
[ 468.939979] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 468.940103] [drm] JPEG decode initialized successfully.
[ 468.940111] amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 468.940112] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 468.940113] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 468.940114] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 468.940115] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 468.940115] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 468.940116] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 468.940117] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 468.940117] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 468.940118] amdgpu 0000:0b:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 468.940119] amdgpu 0000:0b:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 468.940120] amdgpu 0000:0b:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 468.940120] amdgpu 0000:0b:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[ 468.940121] amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[ 468.940122] amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[ 468.940123] amdgpu 0000:0b:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ 468.940295] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.942269] amdgpu 0000:0b:00.0: amdgpu: recover vram bo from shadow start
[ 468.950194] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.951546] amdgpu 0000:0b:00.0: amdgpu: recover vram bo from shadow done
[ 468.951550] [drm] Skip scheduling IBs!
[ 468.951550] [drm] Skip scheduling IBs!
[ 468.951584] amdgpu 0000:0b:00.0: amdgpu: GPU reset(2) succeeded!
[ 468.951599] [drm] Skip scheduling IBs!
[ 468.951603] [drm] Skip scheduling IBs!
[ 468.951605] [drm] Skip scheduling IBs!
[ 468.951610] [drm] Skip scheduling IBs!
[ 468.951612] [drm] Skip scheduling IBs!
[ 468.951614] [drm] Skip scheduling IBs!
[ 468.951616] [drm] Skip scheduling IBs!
[ 468.951618] [drm] Skip scheduling IBs!
[ 468.951620] [drm] Skip scheduling IBs!
[ 468.951621] [drm] Skip scheduling IBs!
[ 468.951622] [drm] Skip scheduling IBs!
[ 468.951623] [drm] Skip scheduling IBs!
[ 468.951623] [drm] Skip scheduling IBs!
[ 468.951625] [drm] Skip scheduling IBs!
[ 468.951626] [drm] Skip scheduling IBs!
[ 468.951628] [drm] Skip scheduling IBs!
[ 468.951630] [drm] Skip scheduling IBs!
[ 468.951632] [drm] Skip scheduling IBs!
[ 468.963897] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.964307] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.964752] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.965145] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.965557] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.965933] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.966807] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 468.966951] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 469.244506] rfkill: input handler enabled
[ 474.209236] amdgpu_cs_ioctl: 17 callbacks suppressed
[ 474.209240] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 474.209537] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 475.211046] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 475.211348] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 475.221179] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 476.212479] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 476.222981] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 477.213927] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 478.215364] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 478.215651] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 479.216629] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 479.216947] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 479.227051] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 480.218359] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 480.228805] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 481.219073] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 482.220283] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 482.220597] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 483.221041] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 483.221310] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 484.222331] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 484.222335] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 484.233657] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 485.223273] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 486.224208] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 486.224480] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 486.233694] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 487.224472] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 487.234842] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 488.224650] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 488.236718] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 489.225674] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 489.853237] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 489.853448] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 491.053868] rfkill: input handler disabled