[radeonsi/llvm12/Navi21] Firefox is causing random GPU hangs with ring vcn_dec_1 timeout
Since upgrade to Mesa 21.2.3 (with llvm12 in Debian testing), I started getting random GPU hangs when using Firefox. Not very frequent ones but still a stark contrast with previous total lack of any hangs.
- GPU: Sapphire Pulse RX 6800 XT
- Kernel 5.14.11.
- Mesa/radeonsi 21.2.3.
- llvm 12.0.1.
- Firefox 94.0b5.
Errors in dmesg:
[Wed Oct 13 22:08:19 2021] [drm] failed to load ucode id (33)
[Wed Oct 13 22:08:19 2021] [drm] psp command (0x6) failed and response status is (0x0)
[Wed Oct 13 22:08:29 2021] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_1 timeout, signaled seq=163823, emitted seq=163827
[Wed Oct 13 22:08:29 2021] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Isolated Web Co pid 109683 thread firefox-bi:cs0 pid 110051
[Wed Oct 13 22:08:29 2021] amdgpu 0000:0f:00.0: amdgpu: GPU reset begin!
[Wed Oct 13 22:08:30 2021] [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[Wed Oct 13 22:08:30 2021] [drm] Register(1) [mmUVD_RBC_RB_RPTR] failed to reach value 0x000000e0 != 0x00000000
[Wed Oct 13 22:08:30 2021] [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[Wed Oct 13 22:08:31 2021] [drm] psp command (0x3) failed and response status is (0x0)
[Wed Oct 13 22:08:34 2021] [drm] psp command (0x2) failed and response status is (0x0)
[Wed Oct 13 22:08:34 2021] [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate ras ta
[Wed Oct 13 22:08:34 2021] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: MODE1 reset
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: GPU mode1 reset
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: GPU smu mode1 reset
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: GPU reset succeeded, trying to resume
[Wed Oct 13 22:08:34 2021] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[Wed Oct 13 22:08:34 2021] [drm] VRAM is lost due to GPU reset!
[Wed Oct 13 22:08:34 2021] [drm] PSP is resuming...
[Wed Oct 13 22:08:34 2021] [drm] reserve 0xa00000 from 0x83fe000000 for PSP TMR
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: SMU is resuming...
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw version = 0x003a4700 (58.71.0)
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: SMU driver if version not matched
[Wed Oct 13 22:08:34 2021] amdgpu 0000:0f:00.0: amdgpu: SMU is resumed successfully!
[Wed Oct 13 22:08:34 2021] [drm] DMUB hardware initialized: version=0x02020003
[Wed Oct 13 22:08:35 2021] [drm] kiq ring mec 2 pipe 1 q 0
[Wed Oct 13 22:08:35 2021] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[Wed Oct 13 22:08:35 2021] [drm] JPEG decode initialized successfully.
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: recover vram bo from shadow start
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: recover vram bo from shadow done
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] [drm] Skip scheduling IBs!
[Wed Oct 13 22:08:35 2021] amdgpu 0000:0f:00.0: amdgpu: GPU reset(1) succeeded!