[amdgpu]: system freezes when trying to turn back on monitor
Brief summary of the problem:
I open this bug as requested from the kernel bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217892
My setup is a dual monitor 4K/144Hz with running sway on it. Both monitors are connected via DP to an AMD Radeon 7950 XTX. Usually if i don't change monitor settings everything works as expected. The monitors do also wake up flawlessly after system idling.
However, sometimes i turn off the second monitor (for example for playing games). For that i made a shortcut in sway which looks like this.
bindsym $mod+Shift+F12 output DP-2 toggle
Now, turning the monitor of works as expected. However, turning it back on i encounter following erros/problems: Main Workspace (Desktop) freezes, second monitor tries to get turned on. (The monitor led goes up) After some time (couple of seconds, around 10-15sec) the main desktop works again, the second screen goes off again. I can do that multiple times but always encounter the same problem. At that point i usually have to reboot the system to get the second monitor back.
Hardware description:
- CPU: AMD Ryzen 9 7950X3D 16-Core Processor AuthenticAMD GNU/Linux
- GPU: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX]
- System Memory: 64GB
- Display(s): Acer ACR:X32 FP:1246006353G00 & Acer ACR:XB283K KV:120918FA54200
- Type of Display Connection: 2x DP
System information:
- Distro name and Version: Gentoo Linux ~amd64
- Kernel version: Linux x2 6.5.2-gentoo #1 (closed) SMP PREEMPT_DYNAMIC Sat Sep 9 00:29:42 CEST 2023 x86_64 AMD Ryzen 9 7950X3D 16-Core Processor AuthenticAMD GNU/Linux
- Custom kernel: gentoo-sources
How to reproduce the issue:
This is also something which can be reproduces quite easily. However sometimes it works almost without problems. (in that case, the monitor comes back but the desktop on the main monitor looks distorted/corruped - maximizing a application fixes that)
- boot into sway with 2 monitors connected (DP).
- turn of on of the monitor (in my case it's always the secondary) via
swaymsg output DP-2 toggle
- try to turn on the monitor again via
swaymsg output DP-2 toggle
Attached files:
dmesg.log I've also attached the distorted/corruped desktop (when doing a normal screenshot i looks normal so i made a photo)
Note:
In the kernel bugzilla report i've wrote that i saw following in dmesg:
[ 8623.325357] [drm] enabling link 1 failed: 15
[ 8623.382238] [drm] REG_WAIT timeout 10us * 5000 tries - enc32_stream_encoder_dp_unblank line:348
[ 8623.437493] [drm] REG_WAIT timeout 10us * 5000 tries - enc32_stream_encoder_dp_unblank line:357
[ 8638.435963] [drm:amdgpu_dm_atomic_check] *ERROR* [CRTC:81:crtc-3] hw_done or flip_done timed out
This however isn't in the actual dmesg output, here i only see:
[ 8061.752367] [drm] dc_validate_with_context:resource validation failed, dc_status:13
[ 8061.807992] [drm] dc_validate_with_context:resource validation failed, dc_status:13
Regards Michael