Monitor detection (multiple variable refresh rate monitors) causing AMDGPU crashes
I believe this is a wayland specific bug as it occurs in both Gnome and KDE when running a Wayland session. If you still believe this is the compositor/arch, please let me know and help link the issue and/or redirect me. Apologies if this is a bit long winded (it's the only way I know). This has been an issue many times in the past I believe, but that was solved and this is now caused by something recent.
System info
---------------------------------Most of this is not overly relevant, as I replicated the issue on live installations that do not have my settings/themes etc
OS: Arch Linux
Kernel: x86_64 Linux 6.6.2-arch1-1
Uptime: 53m
Packages: 1287
Shell: zsh 5.9
Resolution: No X Server
DE: GNOME 45.1
WM: Mutter
WM Theme: Orchis-Green-Dark
GTK Theme: Orchis-Orange-Light [GTK2/3]
Icon Theme: BeautyLine
Font: Cantarell 11
Disk: 1.2T / 27T (5%)
CPU: AMD Ryzen 7 5800X3D 8-Core @ 16x 3.4GHz
GPU: AMD Radeon RX 7900 XT (gfx1100, LLVM 16.0.6, DRM 3.54, 6.6.2-arch1-1)
RAM: 4759MiB / 32019MiB
Monitor 1:
ASUS TUF Gaming VG27AQL1A
Size: 27"
Refresh rate: 170Hz
Monitor 2:
ASUS TUF Gaming VG35VQ
Size: 35"
Refresh rate: 100Hz
---------------------------------
The detection of multiple variable-refresh-rate monitors is causing amdgpu, and subsequently the displays, to crash. It also struggles at the end of a session? It only occurs with multiple monitors and on a wayland session.
Triggers that cause it
- Turning the monitors off, then back on again.
- Changing refresh rate of the monitors.
- Booting the pc from sleep/hibernation.
- Logging out.
- Restarting
- Shutting down (Minor effect. The display just hangs for 5+ seconds after sending poweroff command before shutting down)
Bug replications In order to ensure this was not a bug introduced by anything I had installed or customized incorrectly, I replicated this bug using live installation media. I only used Manjaro live media as I could not start an Ubuntu session in wayland from the live media. Of note, this does not occur in the Manjaro release from earlier this year (or before that), it has only appeared in the last couple of months.
- Manjaro-gnome release 22.0.4-230222 (Feb 22) : Does not occur
- Manjaro-gnome release 23.0.4-231015 (Oct 15) : Does occur
- Manjaro-kde release 23.0.4-231015 (Oct 15) : Does occur
- Arch Linux latest firmware/kernel as of Nov 27 : Does occur
Other effects/notes of the bug I haven't been able to stably replicate the below, or they have occured on my system which I haven't been able to isolate from everything else.
- During a long session, the screen can start to lag and pause momentarily, especially using Brave (or other browsers?)
- After a long session, the shutdown script is spammed with [drm:dc_dmub_srv_cmd_run_list [amdgpu]] ERROR Error queueing DMUB command: status=2
- Dmesg also has the above amdgpu error consistently within it (I have attached the dmesg output)
- journalctl also displays how severe this amdgpu error is spamming (also attached. format:
journalctl -r | grep amdgpu
) - After restoring from sleep (when it does succeed) and maybe logout+login again, moving the mouse over the border of the gnome screenshot frame box (into the box or out of it) will cause the display to have a 1 second delay with 100% consistency.