[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! with power_dpm_force_performance_level=high
Brief summary of the problem:
I've been experimenting with settings power_dpm_force_performance_level to 'high' in sysfs. This is to work around an issue in mutter where it doesn't get the GPU clock to scale up, so causing low FPS for the GNOME Shell animations. Both times I've tried this (under Wayland), the screen has frozen on me (once after an hour, once after minutes) with the following in dmesg:
[ 95.607668] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! [ 100.733029] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=11370, emitted seq=11372 [ 100.733083] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 2388 thread Xwayland:cs0 pid 2707 [ 100.733086] amdgpu 0000:27:00.0: amdgpu: GPU reset begin! [ 100.886807] [drm] free PSP TMR buffer [ 100.916880] amdgpu 0000:27:00.0: amdgpu: MODE2 reset [ 100.916991] amdgpu 0000:27:00.0: amdgpu: GPU reset succeeded, trying to resume [ 100.917330] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000). [ 100.917361] [drm] VRAM is lost due to GPU reset! [ 100.917672] [drm] PSP is resuming... [ 100.937817] [drm] reserve 0x400000 from 0xf403c00000 for PSP TMR [ 101.012713] amdgpu 0000:27:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 101.018712] amdgpu 0000:27:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 101.215208] [drm] kiq ring mec 2 pipe 1 q 0 [ 101.409385] amdgpu 0000:27:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) [ 101.409430] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110 [ 101.409441] amdgpu 0000:27:00.0: amdgpu: GPU reset(2) failed [ 101.409526] amdgpu 0000:27:00.0: amdgpu: GPU reset end with ret = -110 [ 111.466533] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered [ 121.698112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Other things of note:
- I've only observed this under Wayland. In fact in both cases the timed-out process was Xwayland.
- This can happen at idle/low use.
- There's no indication of overheating on the APU, or any other hardware issues for that matter. At the time of freeze it was around 38-40C. (Under full load it tops out at 71C, which is well within spec.)
- I can run 3D games like Xonotic without issue at 4K, 60 FPS with full effects on. Same running GNOME under Xorg with vsync turned off.
- Never seen this issue with 'auto' performance level.
- CPU: Ryzen 5 3400G
- Motherboard: MSI B450M Mortar Max
- Firmwares: AGESA 188.8.131.52, ATOM BIOS: 113-PICASSO-115
- System Memory: 16GiB, 2933MHz
- Display(s): Samsung LU32H850UMUXEN
- Type of Diplay Connection: DP
- Distro name and Version: Fedora 33
- Kernel version: 5.10.14-200.fc33.x86_64. Default cmdline options.
- Mesa: mesa-dri-drivers-20.3.4-1.fc33.x86_64
- Mutter: mutter-3.38.3-1.fc33.x86_64
How to reproduce the issue:
Set power_dpm_force_performance_level to 'high' via sysfs. Wait for the screen to freeze.