Integrated Vega: mpv with vulkan output occasionally causes unrecoverable GPU reset
Brief summary of the problem:
Occasionally, when using mpv with the vulkan API, the screen will turn black and amdgpu never manages to recover. The system becomes completely unusable, though audio still plays on in the background.
Hardware description:
- CPU: AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx
- GPU: Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15D8]
- System Memory: 16 Gigabytes
- Display(s): Interal eDP 1920x1080@60
- Type of Display Connection: eDP
System information:
- Distro name and Version: Arch Linux
- Kernel version:
Linux archbook 6.3.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 30 May 2023 13:44:01 +0000 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- Use mpv git master, build it with libplacebo enabled, use the following
mpv.conf
(in~/.config/mpv/mpv.conf
):
vo=gpu-next
profile=gpu-hq
gpu-api=vulkan
deband=no
scale=ewa_lanczos
scale-blur=0.981251
- Watch YouTube videos for a few days or whatever.
The problem cannot be reliably reproduced. It happens rarely, but often enough to be annoying.
Attached files:
Screenshots/video files
N/A
Log files (for system lockups / game freezes / crashes)
Journal output of the event, tragically cut short by my manual intervention
Jun 13 12:21:48 archbook kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=24816385, emitted seq=24816387
Jun 13 12:21:48 archbook kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process mpv pid 170726 thread mpv/vo pid 170741
Jun 13 12:21:48 archbook kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 13 12:21:49 archbook kernel: [drm] PCIE GART of 1024M enabled.
Jun 13 12:21:49 archbook kernel: [drm] PTB located at 0x000000F400A00000
Jun 13 12:21:49 archbook kernel: [drm] PSP is resuming...
Jun 13 12:21:49 archbook kernel: [drm] reserve 0x400000 from 0xf401c00000 for PSP TMR
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun 13 12:21:49 archbook kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun 13 12:21:49 archbook kernel: [drm] VCN decode and encode initialized successfully(under SPG Mode).
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 13 on hub 0
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Jun 13 12:21:49 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Jun 13 12:21:50 archbook kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_low (-110).
Jun 13 12:21:51 archbook kernel: [drm] ring 0 timeout to preempt ib
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_high (-110).
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: ib ring test failed (-110).
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 13 12:21:52 archbook kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
Jun 13 12:21:52 archbook kernel: [drm] Skip scheduling IBs!
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(2) failed
Jun 13 12:21:52 archbook kernel: [drm] Skip scheduling IBs!
Jun 13 12:21:52 archbook kernel: [drm] Skip scheduling IBs!
Jun 13 12:21:52 archbook kernel: [drm] Skip scheduling IBs!
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -6
Jun 13 12:21:52 archbook kernel: [drm] Skip scheduling IBs!
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -6
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_high>
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_high>
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_high>
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_high>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_high>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_high>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook krunner[170726]: [28.9K blob data]
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next/libplacebo] Failed presenting to queue 0x7f1454370e10: VK_ERROR_DEVICE_LOST
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next] Failed presenting frame!
Jun 13 12:21:52 archbook plasmashell[155298]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook plasmashell[155298]: [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook krunner[170726]: [84B blob data]
Jun 13 12:21:52 archbook krunner[170726]: Audio/Video desynchronisation detected! Possible reasons include too slow
Jun 13 12:21:52 archbook krunner[170726]: hardware, temporary CPU spikes, broken drivers, and broken files. Audio
Jun 13 12:21:52 archbook krunner[170726]: position will not match to the video (see A-V status field).
Jun 13 12:21:52 archbook krunner[170726]: [84B blob data]
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next/libplacebo] vkQueueSubmit: VK_ERROR_DEVICE_LOST (../src/vulkan/command.c:358)
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next/libplacebo] Retrieving query pool results: VK_ERROR_DEVICE_LOST (../src/vulkan/gpu.c:103)
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next/libplacebo] vkQueueSubmit: VK_ERROR_DEVICE_LOST (../src/vulkan/command.c:358)
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next/libplacebo] Failed holding swapchain image for presentation
Jun 13 12:21:52 archbook krunner[170726]: [vo/gpu-next] Failed presenting frame!
Jun 13 12:21:52 archbook acpid[616]: client connected from 699[0:0]
Jun 13 12:21:52 archbook acpid[616]: 1 client rule loaded
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kernel: amdgpu 0000:06:00.0: amdgpu: couldn't schedule ib on ring <gfx_low>
Jun 13 12:21:52 archbook kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:21:52 archbook kwin_wayland_wrapper[897]: amdgpu: amdgpu_cs_query_fence_status failed.
Jun 13 12:22:02 archbook kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=18145884, emitted seq=18145886
Jun 13 12:22:02 archbook kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Jun 13 12:22:02 archbook kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jun 13 12:22:12 archbook systemd-logind[647]: Power key pressed short.
Jun 13 12:22:12 archbook root[171251]: PowerButton pressed
Jun 13 12:22:12 archbook root[171253]: ACPI action undefined: LNXPWRBN:00
Jun 13 12:22:36 archbook systemd-logind[647]: Power key pressed short.
Jun 13 12:22:36 archbook root[171256]: PowerButton pressed
Jun 13 12:22:36 archbook root[171258]: ACPI action undefined: LNXPWRBN:00
Jun 13 12:22:47 archbook root[171260]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:47 archbook root[171262]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:47 archbook root[171264]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:47 archbook root[171266]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:47 archbook root[171268]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:47 archbook root[171270]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:48 archbook root[171272]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:48 archbook root[171274]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:48 archbook root[171276]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:48 archbook root[171278]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:48 archbook root[171280]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:48 archbook root[171282]: ACPI group/action undefined: button/right / RIGHT
Jun 13 12:22:55 archbook systemd-logind[647]: Power key pressed short.
Jun 13 12:22:55 archbook root[171284]: PowerButton pressed
Jun 13 12:22:55 archbook root[171286]: ACPI action undefined: LNXPWRBN:00
No, I won't bisect.
Edited by Nicolas F.