[Firefox VAAPI][Navi21] gpu hang and occasional corrupted buffer while watching hardware accelerated video
System information
System:
Host: mershl-desktop Kernel: 6.1.8-200.fc37.x86_64 arch: x86_64 bits: 64
compiler: gcc v: 2.38-25.fc37 Desktop: GNOME v: 43.2 tk: GTK v: 3.24.36
wm: gnome-shell dm: GDM Distro: Fedora release 37 (Thirty Seven)
CPU:
Info: 8-core model: AMD Ryzen 7 3700X bits: 64 type: MT MCP arch: Zen 2
rev: 0 cache: L1: 512 KiB L2: 4 MiB L3: 32 MiB
Speed (MHz): avg: 2278 high: 3600 min/max: 2200/4426 boost: enabled cores:
1: 2200 2: 2200 3: 2200 4: 2200 5: 2200 6: 2200 7: 2200 8: 2200 9: 2200
10: 3600 11: 2058 12: 2200 13: 2200 14: 2200 15: 2200 16: 2200
bogomips: 115204
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] driver: amdgpu
v: kernel arch: RDNA-2 pcie: speed: 16 GT/s lanes: 16 ports: active: DP-1
empty: DP-2,DP-3,HDMI-A-1 bus-ID: 0a:00.0 chip-ID: 1002:73bf
Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.7
compositor: gnome-shell driver: X: loaded: amdgpu dri: radeonsi gpu: amdgpu
display-ID: 0
Monitor-1: DP-1 model: Samsung C34H89x res: 3440x1440 dpi: 110
diag: 864mm (34")
API: OpenGL v: 4.6 Mesa 22.3.4 renderer: AMD Radeon RX 6800 (navi21 LLVM
15.0.7 DRM 3.49 6.1.8-200.fc37.x86_64) direct render: Yes
Describe the issue
In an hour of watching hardware accelerated video (e.g. Youtube) the system will completly hang and require a reboot once.
While scrubbing through HTML5 videos the buffer often times shows corrupted, with a low framerate for a few seconds, then catching up. I'm not able to reproduce the hang nor the corruption with the same settings on an AMD RavenRidge APU. This issue might be specific to the Navi 21 or Navi 2x family.
Disabling media.hardware-video-decoding.enabled
in Firefox fixes both issues.
Regression
I've seen these issues since the introduction of VAAPI in Firefox. Enabling media.hardware-video-decoding.enabled
after each update and going back to disabling it after the first hang.
Log files as attachment
(repeating lines removed)
libva info: VA-API version 1.16.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_16
libva info: va_openDriver() returns 0
gmc_v10_0_process_interrupt: 132 callbacks suppressed
amdgpu 0000:0a:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:4 pasid:32774, for process RDD Process pid 3184 thread firefox:cs0 pid 3368)
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010be01000 from client 0x1b (UTCL2)
amdgpu 0000:0a:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00441051
amdgpu 0000:0a:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:0a:00.0: amdgpu: MORE_FAULTS: 0x1
amdgpu 0000:0a:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:0a:00.0: amdgpu: PERMISSION_FAULTS: 0x5
amdgpu 0000:0a:00.0: amdgpu: MAPPING_ERROR: 0x0
amdgpu 0000:0a:00.0: amdgpu: RW: 0x1
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010be05000 from client 0x1b (UTCL2)
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010be06000 from client 0x1b (UTCL2)
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010be0b000 from client 0x1b (UTCL2)
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010be00000 from client 0x1b (UTCL2)
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010be04000 from client 0x1b (UTCL2)
gmc_v10_0_process_interrupt: 2369 callbacks suppressed
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010bef2000 from client 0x1b (UTCL2)
amdgpu 0000:0a:00.0: amdgpu: in page starting at address 0x000080010bef1000 from client 0x1b (UTCL2)
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2282050, emitted seq=2282052
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 3184 thread firefox:cs0 pid 3368
amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
amdgpu 0000:0a:00.0: amdgpu: free PSP TMR buffer
amdgpu 0000:0a:00.0: amdgpu: MODE1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume
[drm] PCIE GART of 512M enabled (table at 0x0000008001300000).
[drm] VRAM is lost due to GPU reset!
[drm] PSP is resuming...
[drm] reserve 0xa00000 from 0x83fd000000 for PSP TMR
amdgpu 0000:0a:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0a:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0a:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5600 (58.86.0)
amdgpu 0000:0a:00.0: amdgpu: SMU driver if version not matched
amdgpu 0000:0a:00.0: amdgpu: use vbios provided pptable
amdgpu 0000:0a:00.0: amdgpu: SMU is resumed successfully!
[drm] DMUB hardware initialized: version=0x02020017
[drm] kiq ring mec 2 pipe 1 q 0
[drm] VCN decode and encode initialized successfully(under DPG Mode).
[drm] JPEG decode initialized successfully.
amdgpu 0000:0a:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
amdgpu 0000:0a:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
amdgpu 0000:0a:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
amdgpu 0000:0a:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
amdgpu 0000:0a:00.0: amdgpu: recover vram bo from shadow start
amdgpu 0000:0a:00.0: amdgpu: recover vram bo from shadow done
[drm] Skip scheduling IBs!
amdgpu 0000:0a:00.0: amdgpu: GPU reset(2) succeeded!
amdgpu: amdgpu_cs_query_fence_status failed.
amdgpu: The CS has been rejected (-125), but the context isn't robust.
amdgpu: The process will be terminated.
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GFX: RenderThread detected a device reset in PostUpdate (t=6864.17) [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
amdgpu: The CS has been rejected (-125). Recreate the context.
Error reading events from display: Connection reset by peer
Error reading events from display: Broken pipe
Unregistered Authentication Agent for unix-session:2 (system bus name :1.78, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
org.gnome.SettingsDaemon.Wacom.service: Main process exited, code=exited, status=1/FAILURE