[Regression 6.0.19->6.1.12][Bisected] VA-API transcoding hangs the system (amdgpu: IH ring buffer overflow)
Brief summary of the problem:
VA-API transcoding with RX Vega 64 (VEGA 10) using FFmpeg hangs the system in Linux 6.1.12.
Linux 6.2.0-rc8 is also affected. Downgrading to Linux 6.0.19 and older fixes the issue.
Hardware description:
- CPU: Ryzen 9 5950X
- GPU: [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1)
- System Memory: DDR4 3200 16G *2
- Display(s): DELL U2518D
- Type of Display Connection: DP
System information:
- Distro name and Version: Manjaro Linux (GNOME 43.3, Wayland)
- Kernel version: 6.1.12 and 6.2.0-rc8
- Mesa driver version 22.3.6 and 23.1.0-devel
How to reproduce the issue:
Login to the desktop GUI environment. Run the following FFmpeg command:
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i ANY_H264_OR_HEVC_1080P_VIDEO.mp4 -vf scale_vaapi=format=nv12 -c:v h264_vaapi -an -sn -f null -
System will hang immediately and the amdgpu driver resets. The following error message can be found via dmesg.
[ 40.878955] amdgpu 0000:0c:00.0: amdgpu: IH ring buffer overflow (0x00080100, 0x00000000, 0x00000120)
[ 40.878961] amdgpu 0000:0c:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:174 vmid:5 pasid:32769, for process ffmpeg pid 2665 thread ffmpeg:cs0 pid 2666)
[ 40.878963] amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x000080010e01b000 from IH client 0x1b (UTCL2)
[ 40.878968] amdgpu 0000:0c:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x0054115D
[ 40.878969] amdgpu 0000:0c:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ 40.878970] amdgpu 0000:0c:00.0: amdgpu: MORE_FAULTS: 0x1
[ 40.878971] amdgpu 0000:0c:00.0: amdgpu: WALKER_ERROR: 0x6
[ 40.878971] amdgpu 0000:0c:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 40.878972] amdgpu 0000:0c:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 40.878972] amdgpu 0000:0c:00.0: amdgpu: RW: 0x1
Attached files:
Log files (for system lockups / game freezes / crashes)
- Dmesg log: dmesg.log