dGPU keeps shutting down causing short freezes and does not wake up from sleep
Brief summary of the problem:
Last 3 months I was using dGPU as primary for GNOME Wayland via ENV{DEVNAME}=="/dev/dri/card0", TAG+="mutter-device-preferred-primary"
.
I used sudo radeontop -b 3 workaround to prevent permanent freeze on dGPU shutdown and have a working wake-up from sleep (s2idle).
On 5.17.1 does not seem to help with sleep anymore so here I will share all the necessary information to fix these issues preferably without any more workarounds.
Hardware description:
- Machine Type: Laptop System: ASUSTeK product: ROG Strix G513QY_G513QY v: 1.0 (AMD Advantage Edition)
- CPU: 8-core model: AMD Ryzen 9 5900HX with Radeon Graphics
- iGPU: 08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [1002:1638] (rev c4)
- dGPU (RX6800M): 03:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT / 6800M] [1002:73df] (rev c3)
- iGPU is connected to internal laptop screen via eDP, dGPU is set as primary for GNOME Wayland session
- System Memory: 64 GB
System information:
- Distro name and Version: Manjaro rolling
- Kernel version: Linux 5.17.1-3-MANJARO SMP PREEMPT Thu Mar 31 12:27:24 UTC 2022 x86_64 GNU/Linux
How to reproduce the issue:
Short freezes
- Set dGPU as primary for GNOME Wayland via
ENV{DEVNAME}=="/dev/dri/card0", TAG+="mutter-device-preferred-primary"
- Login into GNOME Wayland
- Tail the system journal via
sudo journalctl -f
- Stop UI interactions for 10 seconds then resume
- Notice a short freeze 1-2 seconds and log messages in the journal about dGPU resuming
Does not wake up from sleep (s2idle)
- Set dGPU as primary for GNOME Wayland via
ENV{DEVNAME}=="/dev/dri/card0", TAG+="mutter-device-preferred-primary"
- Login into GNOME Wayland
- Press the power button to go into Sleep
- After a few seconds press the power button again to resume
- Notice the keyboard and indicators light up but the screen stays blank
Results of a more extensive testing
I tested multiple kernel versions from 5.15, 5.16 and 5.17 series and no longer think it's a regression, they all work the same. So, I did an extensive testing on kernel 5.17.1-3-MANJARO. Here is what I did and found:
- Added
pm_debug_messages=1 amd_pmc.dyndbg=+p
to kernel parameters - Logged into Gnome Wayland session with dGPU as primary (iGPU has eDP output connected to laptop screen)
- Ran
sudo journalctl -f
, pressed Win button (Gnome workspace overview) and watched - 5.17.1_journal_minimal-load.zip - every 7 seconds dGPU seems to resume and go back to sleep again
- Entered sleep, waited 10 seconds, tried to resume -> blank screen -> force shutdown
- Logged in again the same way and additionally ran
sudo radeontop -b 3
. Entered overview and watched -
5.17.1_journal_radeontop-dgpu.zip - with
radeontop
running dGPU not longer goes back and forth sleep/resume - Entered sleep, waited 10 seconds, tried to resume -> blank screen -> force shutdown
- Logged in again the same way, run instead
glxgears
(XWayland, used dGPU for OpenGL) and watched -
5.17.1_journal_glxgears.zip - with
glxgears
running dGPU also does not go back and forth resuming - Entered sleep, waited 10 seconds, resumed successfully. Entered sleep, waited 8 hours, resumed successfully. I still ended up with blank screen while resuming again after a few hours.
- My BIOS does not have any options for enabling NVME passwords
Attached files:
Log files
- inxi + journal Notice numerous dGPU wakeups at the end of the log and last message about entering the sleep state
- dgpu_suspend.zip
- igpu_suspend.zip
- acpidump.zip