[Dell G5 5505] Fence fallback timer expired on ring sdma0/gfx
Brief summary of the problem:
System becomes unusable shortly after entering the desktop environment (Sway 1.6)
Hardware description:
- CPU: Ryzen 7 4800H
- GPU: Renoir iGPU, Navi10 dGPU
- System memory: 32GB
- System firmware: 1.4.4
- Display(s): Integrated panel
- Type of Diplay Connection: eDP
System information:
- Distro name and Version: Gentoo x86-64
- Kernel version: 5.12.5
- Custom kernel: Patches from 5.13.y applied
- AMD package version: No package
- Firmware version: both Renoir and Navi10 are at 20.10 (currently the latest available in linux-firmware)
Applied patches (from torvalds/master):
- 1689fca0d62aa7a685363999f9fc380c0666d955
- 055162645a40567080d8c2d1b135f934977ac3cf
I already had to apply these two commits on 5.11.y, otherwise the system constantly locks up, while causing heavy filesystem havoc at the same time.
How to reproduce the issue:
Sway is configured (via WLR_DRM_DEVICES) to only use the Renoir iGPU. It is unclear to me what exactly triggers a wakeup of the Navi10, but things go sideways after the dGPU has been brought online.
Just logging in an starting a Sway session is enough to trigger this behaviour.
Note that these message also constantly show up on 5.11.y when the Navi10 is brought online:
amdgpu 0000:03:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
amdgpu 0000:03:00.0: amdgpu: [PrepareMp1] Failed!
[drm] Failed to set MP1 state prepare for reload
However the system continues to function after this.
Using runpm=0 is not an option for me, because of massively increased power draw and temperature.