Playing yuv420p videos with VA-API causes a GPU reset
Before submitting your bug report:
Brief summary of the problem:
When playing a yuv420p video with VA-API hardware acceleration, a page fault happens, and 10 seconds later the GPU times out and needs to be reset. This is normally inconsistent, but happens 100% of the time for me when booting with amdgpu.vm_fault_stop=1 amdgpu.vm_debug=1
.
If X11 is running when this happens, this causes the screen to flicker between the last 2 images shown. The mouse can be moved, and will even change shape (e.g. it will become an I-bar when hovering over selectable text) but the rest of the screen does not change. If X11 is not running, this will cause the display to go blank for a couple of seconds while the GPU is resetting, but the screen will continue to work.
This does not happen when using VDPAU.
Hardware description:
- CPU: AMD Ryzen 7 5800X 8-Core Processor
- GPU: 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c7)
- System Memory: 32 GiB
- Display(s): ASUS PB278
- Type of Display Connection: DisplayPort
System information:
- Distro name and Version: Arch Linux (latest)
- Kernel version: 5.18.16-arch1-1
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- Download the test, or generate it with
ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -vf format=yuv420p test.mp4
- Decode the video using VA-API:
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i test.mp4 -f null -
Using -f null -
causes ffmpeg to discard the decoded video instead of displaying it. It is not necessary for reproduction, but allows you to test without needing to restart X11 every time the page fault is triggered.
Attached files:
Log files (for system lockups / game freezes / crashes)
The page fault happens in the dmesg log on line 1325:
[ 45.638941] amdgpu 0000:09:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32769, for process ffmpeg pid 755 thread ffmpeg:cs0 pid 756)
[ 45.639433] amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000800105a00000 from client 0x12 (VMC)
[ 45.639947] amdgpu 0000:09:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105611
[ 45.640453] amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: VCN0 (0x2b)
[ 45.640952] amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1
[ 45.641436] amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0
[ 45.641913] amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x1
[ 45.642386] amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 45.642846] amdgpu 0000:09:00.0: amdgpu: RW: 0x0
The dmesg log also contains a task dump during the hang, which I produced with SysRq+t
.