[navi21] GPU hang during video decoding in Firefox (ring vcn_dec_1 timeout)

I started getting periodic GPU hangs that happens during vaapi accelerated video playback in Firefox.

GPU: Sapphire Pulse RX 6800 XT
Kernel: 5.19.8
Mesa VAAPI: 22.2.0~rc3-1
Firmware: latest from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amdgpu i.e. including this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=3c1662d9a546ab01e42a34aec30ca44965fa4d7d
Firefox 105.0b9

Here is an example of recent hang log from dmesg:

[10422.428815] [drm] failed to load ucode VCN0_RAM(0x35) 
[10422.428845] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0)
[10430.435115] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_1 timeout, signaled seq=22205, emitted seq=22208
[10430.435272] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 2591 thread firefox-bi:cs0 pid 4478
[10430.435404] amdgpu 0000:0f:00.0: amdgpu: GPU reset begin!
[10434.435382] amdgpu 0000:0f:00.0: amdgpu: failed to suspend display audio
[10434.777329] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[10435.024724] [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000070 != 0x00000000
[10435.272396] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[10435.525357] [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[10435.778083] [drm] Register(1) [mmUVD_RBC_RB_RPTR] failed to reach value 0x000000a0 != 0x00000000
[10436.042909] [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[10436.131193] [drm] free PSP TMR buffer
[10436.178967] CPU: 0 PID: 39695 Comm: kworker/u64:1 Not tainted 5.19.8 #1
[10436.178969] Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P4.80 02/16/2022
[10436.178971] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[10436.178976] Call Trace:
[10436.178977]  <TASK>
[10436.178978]  dump_stack_lvl+0x44/0x5c
[10436.178982]  amdgpu_do_asic_reset+0x26/0x459 [amdgpu]
[10436.179088]  amdgpu_device_gpu_recover_imp.cold+0x59d/0x8e0 [amdgpu]
[10436.179176]  amdgpu_job_timedout+0x156/0x190 [amdgpu]
[10436.179269]  ? __switch_to+0x106/0x430
[10436.179272]  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
[10436.179274]  process_one_work+0x1c7/0x380
[10436.179276]  worker_thread+0x4d/0x380
[10436.179277]  ? process_one_work+0x380/0x380
[10436.179278]  kthread+0xe9/0x110
[10436.179279]  ? kthread_complete_and_exit+0x20/0x20
[10436.179280]  ret_from_fork+0x22/0x30
[10436.179282]  </TASK>
[10436.179286] amdgpu 0000:0f:00.0: amdgpu: MODE1 reset
[10436.179289] amdgpu 0000:0f:00.0: amdgpu: GPU mode1 reset
[10436.179394] amdgpu 0000:0f:00.0: amdgpu: GPU smu mode1 reset
[10436.703700] amdgpu 0000:0f:00.0: amdgpu: GPU reset succeeded, trying to resume

Edited Sep 13, 2022 by Shmerl

Admin message

[navi21] GPU hang during video decoding in Firefox (ring vcn_dec_1 timeout)