Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Asus Zephyrus G14 GA402 - Suspend not working reliably since Kernel 6.6.8
Since kernel 6.6.8, I've been having suspend issues. Sometimes, a suspend request would result in the screen blanking, but the power LED remains lit. Other times, suspend would occur, but randomly, the system wakes itself (power LED is solid white), and eventually the fans turn on to full speed and the system gets very warm. A long press of the power button shuts it down, and it reboots normally.
After a bit of experimenting, the bad suspend only occurs on lid close. Suspend works normally if a suspend is requested by pressing the power button. This behaviour has been confirmed by other Asus G14GA402 users, as well one Asus TUF Gaming A16 Advantage Edition FA617NS user.
There was no issue with suspend on kernel 6.6.7 and lower. The issue has persisted through 6.6.8/9/11/13 and 6.7.2.
FWIW, Thinkpad T14 Gen3 and T14s Gen3 users with Ryzen 7 6850u processors seem to be experiencing a similar issue, but in their case, downgrading to kernel 6.6.10 fixes the problem.
Hardware description:
Asus Zephyrus G14 GA402RJ AMD Ryzen 9 6900HS BIOS 319
I've attached two runs of s2idle report, named to show the kernel the laptop was booted with at time of test. s2idle report lists " Kernel is tainted" for 6.6.13 and well as when run under 6.7.2, but it runs without that warning on 6.6.7.
System information:
Distro name and Version: Fedora 39
Kernel version: Linux anzigo-swift-x-fedora 6.6.13-200.fc39.x86_64 #1 (closed) (closed) SMP PREEMPT_DYNAMIC Sat Jan 20 18:03:28 UTC 2024 x86_64 GNU/Linux
How to reproduce the issue:
Close display/lid
Power LED stays solid on. Storage LED blinks for a couple seconds. Powe LED does not start blinking. After 20 seconds or so (it varies), fan go to full speed.
Opening lid does nothing, and a long press of power button is needed to turn off the device.
Since kernel 6.6.8, I've been having suspend issues. Sometimes
I looked; but nothing jumps out to me between those kernel versions.
A few ideas:
Can you keep all userspace and firmware the same and be certain 6.6.7 works every time? You might need to rebuild your initramfs to confirm this. IE To rule out it being something like a linux-firmware upgrade that happened caused it.
If you're sure it's the kernel, can you please bisect 6.6.7 to 6.6.8?
I've attached two runs of s2idle report, named to show the kernel the laptop was booted with at time of test. s2idle report lists " Kernel is tainted" for 6.6.13 and well as when run under 6.7.2, but it runs without that warning on 6.6.7.
The kernel taint detection might be a little too aggressive. In your case it's tainted from a WARNING. This is the same issue that happened to Polaris: #3122
Unsupported pwrseq engine id: 2
That patch that caused it backported to 6.6.13. Can you check if the solution in that bug helps your system as well?
I had this same issue when running Fedora Rawhide about a month ago, which was running a not yet released kernel 6.7 rc. I downgraded to Fedora 39 and things were back to normal for a while, but just today I got the same issue when coming back from class. Laptop wouldn't wake up from sleep when I took it out of the backpack, burning hot, fans 100%. When I booted the laptop, it had gone from roughly 55% to 20%, from the span of about a 15 minute walk.
If there's anything I can do to help, please let me know. I don't know how to take logs, but I do have an old S0ix self test performed on Rawhide a month ago. I'll do another one today and reply to this with a new log:
out.txt
Edit: This is on a GA402RJ bios 319, but the issue occurred for me on the older bios 318 as well.
Someone on the asus-linux discord server says "Try adding "resume" into kernel parameters, working for me". Could just be a workaround to the actual issue but I figured I'd put it here.
He also said: "I had the same problem, my laptop could not suspend, but after some research, I found that it should have "resume" into kernel parameters, in grub it's grub cmdline line, into ur boot config, idk I use systemd boot, and worked my laptop can stay hours into suspend and when I open it, wakes instantly"
I only got this issue from my knowledge today, and one month ago on rawhide it would happen very often. Since OP says that it's triggered by closing the lid, it's possible I never encountered the issue since I typically power off my device by using the power button. I just tried closing the lid though and didn't have the issue. The fact that it's inconsistent is not helpful to the case :/
I'd start out by following what @anzigo said that 6.6.7 doesn't have the issue. Build 6.6.7 (or install an RPM) and confirm that it doesn't happen. Build 6.6.8 (or install an RPM) and confirm it does. Since @anzigo found it's only triggered by the lid that might help repro it.
@ryanabx To reproduce in my case, I need to close the lid and resume a couple times before failure. You can pretty much predict that it will happen the NEXT lid close, because the last good lid close suspend takes about 10 seconds, instead of 1 or 2. After waking, and suspending one more time, the failure is reproduced.
I'd love to help test. I'm not sure how to build from source or revert commits though since I've not touched this project before. I typically just run Fedora 39 stock without changing around packages. If you have some steps on how to do so, please let me know :)
@superm1 I'm a bit of a novice when it comes to this kind of stuff, but I was able to compile kernel 6.6.7 with the 3aae4ef4d799 patch included (only that singular patch). With that, I was able to reproduce the issue twice so far.
@ryanabx I had to suspend/resume using a lid close probably around ten times to get the issue to reproduce itself. No wonder you're having a hard time reproducing it. I was about to give up, and then it happened again.
I also have problem with G14 2021 (GA401QC) with silverblue. It suspends when i close the lid or via power button but can’t wake properly which i mean can’t wake the nvme disk. Thus everything goes down, system runs because they’re on ram but everything else not working and fans going all speed. Downgrading to kernel 6.6.7 not fixed the issue. Among other fixes i tried, only amd_iommu=off works for me.
Do the fans kick in real loud and the system heats up? Also does the LED that determines whether the system is on stay solid white or does it fade in and out like it does in sleep mode? I wanna make sure it's a similar or the same issue.
Similar issue on my Lenovo Yoga 14ARP8 on Arch: Lid close enters standby normally on linux{,-headers} up to version 6.6.10-arch1-1 and on the latest linux-lts (6.6.15-1-lts), i.e. the regression starts at 6.7 for me.
They link to an archlinux forum thread, and the guy there has these specs:
CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz
GPU: AMD ATI Radeon 680M
I linked them both to this thread. Hopefully we can gather more data this way
It's really unfortunate that it's the fix for #2220 (closed) as that's also a pretty terrible problem. I did explicitly test suspend resume when testing that fix, so it's surprising to see this escape.
For everyone who has reproduced it so far, can you please check /var/lib/systemd/pstore. Do you by chance have a kernel panic that got saved there from around when it happened?
If not, does anyone who reproduces it have the ability to try to get netconsole or a serial console to catch a kernel panic as it's happening?
I'll try to get this reproduced as well better understand it. Is there any rhyme or reason to observed patterns? Specifically running apps, specific DE's, specific display combinations etc?
I know there is a mention that it's triggered with a lid suspend, so I would wonder if that's causing a sequence of events that involve DPMS being turned off at a specific timing? That would potentially explain why I didn't reproduce it when testing that fix - I did it without an eDP panel connected and only used DP. The suspend testing was done just using amd_s2idle.py for a handful of times.
I believe I've experienced the issue on both KDE and GNOME. Most recently I've been experiencing it on GNOME though.
Today I had the issue twice, I'm now running Fedora Silverblue Rawhide with the latest unstable kernel (I like to distro-hop). Both times I had my lid closed, so either I closed the lid to make it sleep or I eventually closed the lid after making it sleep some other way.
I keep my laptop at 120hz@2560x1600p always, I doubt that has any effect though.
These last two occurrences I definitely had at least Firefox open, but probably more apps too. I'll try to trigger the bug again.
I have no files in /var/lib/systemd/pstore unfortunately, and I remember being told to check system logs at some point and the logs didn't even mention the laptop trying to wake up from sleep at all whenever the issue would occur, but otherwise the logs would show information about the laptop waking up.
===== My Reproduction Attempts =====
My very first attempt at repro worked. I had Firefox open, element in a separate workspace, and discord, closed lid.
Shut down computer and rebooted *
Second attempt (success): Firefox open, same tabs as the first repro. Closed lid.
Shut down computer and rebooted *
Third attempt (success): Opened Firefox, wrote about second attempt, then closed Firefox. Closed lid.
I'm going to stop writing about my attempts until I find out what doesn't cause the problem in kernel 6.8. I'm wondering if the bug is easier to repro on 6.7+ because of how easily I can trigger it now.
Edit: Went back to F39 with kernel 6.6.14, now the issue doesn't reliably trigger like above.
I don't have a kernel panic log from the freeze in /var/lib/systemd/pstore. I also don't see the CapsLock LED flashing when the system locks up. I.e., I don't expect this to be a kernel panic but I can't be sure.
I've enabled a USB serial console with console=tty0 console=ttyUSB0,115200. I got the following log when reproducing the issue:
wlp2s0: deauthenticating from b6:40:38:eb:89:1d by local choice (Reason: 3=DEAUTH_LEAVING)ath11k_pci 0000:02:00.0: failed to enqueue rx buf: -28ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory settingath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory settingPM: suspend entry (s2idle)Filesystems sync: 0.017 secondsFreezing user space processesFreezing user space processes completed (elapsed 0.001 seconds)OOM killer disabled.Freezing remaining freezable tasksFreezing remaining freezable tasks completed (elapsed 0.000 seconds)printk: Suspending console(s) (use no_console_suspend to debug)
I then appended no_console_suspend to the command line. However, with this I can no longer reproduce the issue in multiple attempts. Log of successful suspend cycle but that's probably not useful:
wlp2s0: deauthenticating from b6:40:38:eb:89:1d by local choice (Reason: 3=DEAUTH_LEAVING)PM: suspend entry (s2idle)Filesystems sync: 0.017 secondsBluetooth: hci0: unexpected event for opcode 0x0c24Freezing user space processesFreezing user space processes completed (elapsed 0.001 seconds)OOM killer disabled.Freezing remaining freezable tasksFreezing remaining freezable tasks completed (elapsed 0.001 seconds)[drm] VCN decode and encode initialized successfully(under DPG Mode).[drm] JPEG decode initialized successfully.amdgpu 0000:04:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0amdgpu 0000:04:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0amdgpu 0000:04:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8amdgpu 0000:04:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8OOM killer enabled.Restarting tasks ... done.random: crng reseeded on system resumptionPM: suspend exitGeneric FE-GE Realtek PHY r8169-0-100:00: attached PHY driver (mii_bus:phy_addr=r8169-0-100:00, irq=MAC)r8169 0000:01:00.0 enp1s0f0: Link is Downwlp2s0: authenticate with 14:91:82:c3:5f:34 (local address=04:7b:cb:b3:44:7e)wlp2s0: send auth to 14:91:82:c3:5f:34 (try 1/3)wlp2s0: authenticatedwlp2s0: associating to AP 14:91:82:c3:5f:34 with corrupt probe responsewlp2s0: associate with 14:91:82:c3:5f:34 (try 1/3)wlp2s0: RX AssocResp from 14:91:82:c3:5f:34 (capab=0x511 status=0 aid=6)wlp2s0: associated
I could also try netconsole, if you think that might behave differently.
Specifically running apps, specific DE's, specific display combinations etc?
I've only tried GNOME 45 on Wayland and the internal display so far. I could reliably trigger it there for my bisect without any applications open.
I know there is a mention that it's triggered with a lid suspend, so I would wonder if that's causing a sequence of events that involve DPMS being turned off at a specific timing?
I could trigger with a press of the power button the same as with a lid close. I.e., at least on my laptop the lid close doesn't seem to be significant.
@qwelias If I interpret your logs correctly, your laptop seems to actually suspend. Do you see the power LED flashing slowly, indicating successful suspend, or does the LED stay on without flashing? If the former, I'm not sure whether this is the same issue as my power LED stays on (and I see no core dump and not even the serial console ever recovers after waiting for a few minutes). That said, it might still be the same root cause with different symptoms (maybe because of Wayland vs. X11).
Eventually it does start flashing (reaches suspend), sometimes it's instant, sometimes it takes a few minutes, and then it always randomly wakes up and drains battery, or idk.
And same freeze still happens on Wayland, but less frequent (I was able to cycle it a few times before getting freeze, where as X it's usually on the first time).
If that would be of help I could do more experiments, I'm not experienced with kernel/s2idle/drivers so cannot debug it fully myself, but could follow instructions if given.
By detaching the framebuffer console with echo 0 > /sys/class/vtconsole/vtcon1/bind, I can trigger the issue also with a USB serial console and no_console_suspend. I enabled dynamic debug prints for amdgpu and now get the following messages via serial console on suspend:
PM: suspend entry (s2idle)Filesystems sync: 0.012 secondsrfkill: input handler enabledamdgpu 0000:04:00.0: amdgpu: GFXOFF is disabled, re-init SPM golden settingsBluetooth: hci0: unexpected event for opcode 0x0c24Freezing user space processesFreezing user space processes completed (elapsed 0.001 seconds)OOM killer disabled.Freezing remaining freezable tasksFreezing remaining freezable tasks completed (elapsed 0.000 seconds)amdgpu 0000:04:00.0: amdgpu: GFXOFF is disabled, re-init SPM golden settings
There are more debug log lines from amdgpu right after pressing the power button to suspend, mostly HDCP-related. Here is the full log: amdgpu-suspend.log
@superm1 If additional debug logging would be useful, let me know what to enable.
@juergbi
One more assertion I'd like to understand. You mentioned in #3153 (closed) that 6.6.10 is fine, but 6.7 isn't:
Curiously, it seems that the above commit was backported to 6.6.10 where it doesn't cause any issues for me. So it might be the combination of that commit with another change in 6.7 that triggers the issue.
Would you be able to repeat the bisect manually applying that commit to 6.7 during the bisect as necessary? This may help to root out any other dependency. I know there have been some other regressions in ath11k, I wonder if https://bugzilla.kernel.org/show_bug.cgi?id=218364 is caught up in the issue? Maybe it's best to repeat the bisect applying both the amdgpu begin/end use patch and that ath11k patch in each applicable step.
I disabled CONFIG_MAC80211_DEBUGFS during the bisect, so https://bugzilla.kernel.org/show_bug.cgi?id=218364 should not be a concern. I could attempt another bisect with the patch on top but I don't know when I'll have time for that. bisect takes quite some time in my current setup.
I think that disabling that kconfig sufficiently avoids the ath11k bug. But yes if you can repeat the bisect when you have time with this patch applied in applicable kernels that would be very helpful as I haven't reproduced this issue yet.
5095d5418193eb2748c7d8553c7150b8f1c44696 is the first bad commitcommit 5095d5418193eb2748c7d8553c7150b8f1c44696Author: Mario Limonciello <mario.limonciello@amd.com>Date: Fri Oct 6 13:50:20 2023 -0500 drm/amd: Evict resources during PM ops prepare() callback
Suspend works for me so far with this patch on top of otherwise stock 6.7.3. I do see the following error in the log on each resume, though. I think that's a new error but I'm not completely sure.
And now my screen is corrupted/garbled on every other frame after resume (except for the mouse cursor which is always fine on top). Wasn't an issue for the first few suspend cycles. Still no freeze or crash, though.
Edit: Another suspend cycle later, the screen is fine again.
With all 3 commits reverted, I can't reproduce the display corruption anymore, thanks I.e., with your patch and the three reverts, everything looks fine here so far (still seeing the DMUB error in the log, though).
OK, please report your findings for the display corruption issue to the other bug so Arun can help advise next steps there. Once I know the situation with the other 6.6.y issue here I'll post some patches.
Those affected by this and on GONME; can you try to go into Settings->Privacy and turn off "lock screen on suspend" and see if you can still trip this issue? I have not reproduced it but have a suspicion it's related to timing of events that happen with that relative to the suspend cycle actually starting.
ThinkPad T14s AMD Gen 3, Ryzen 7 6850U RX680M, Arch Linux 6.7.3-zen1-2-zen, GNOME 45.3
Can confirm, suspension works if "lock screen on suspend" is turned off.
Lock screen on suspend off
Suspends correctly ('breathing' red LED) on lid close
Suspends correctly via suspend button on menu
Suspends correctly via power button press
Lock screen on suspend on
Does not suspend (red LED on continuously) on lid close
Does not suspend via power button press
Suspends correctly via lid close and power button after lock screen is invoked directly (Super + L)
I'm guessing we're deadlocking due to a race condition that's occurring because of the order of events for Rembrandt now if the lock screen triggers after suspend starts.
Host: 21CF003DMX ThinkPad T14 Gen 3 Kernel: 6.7.3-arch1-2 DE: GNOME 45.3 CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz GPU: AMD ATI Radeon 680M
Disabling "lock on suspend" seemed to change something.
I've successfully done about a dozen suspend/wake cycles via menu and by closing lid, still managed to get a screen freeze by trying to wake within about 2 seconds (this also matches timing from logs in gnome-shell crash) after getting blinking led. (similar to what I was getting on Wayland)
With enabled "lock on suspend" it freezes almost every wake regardless of time spent under suspend.
So seems like disabling "lock on suspend" could be a workaround for me, but there's still a risk of randomly draining battery during suspend (I don't know for sure what happens there).
Will report later if battery still drains
Is the screen freeze the same issue we're discussing here? Solid LED and all?
If so, my theory is that on 6.7+, there's some additional issue (could be with lock on suspend) that causes the freeze to happen more often, and pre 6.7 the issue is there, but happens less often. I could be wrong though
oh I see how I may have confused things, with my laptop it looks like I'm reaching suspend, but then it freezes on wake, and there's a good chance find it turned off because battery gets drained (so is it really a suspend? idk, but led does flash)
but looks like for others it may freeze on suspend instead of wake
and there's also mixed symptoms in original issue description
I didn’t get it to show up sadly :/, that’s a screenshot from online.
Maybe the other setting still helps though. On a pre 6.7 kernel I still get the same amount of crashes as usual even with the other setting off, that is to say not 100% of the time but it still happens every so often
Ok, so I think there are two issues going on. Going into sleep with automatic screen lock seems to work OK. However, waking up from sleep has been a hit and miss. Generally, if it hasn't been a long time since going into suspend (2~3 hours?), it will wake normally. Otherwise, it refuses to wake up. The LED goes from glowing to continuous light, but the screen stays blank and does not respond to any input.
I'm trying to reproduce it and find the exact conditions that trigger it. Also, what should I look for in journalctl?
While it goes to sleep and wakes up without any issues (given you don't try to wake it up immediately), the sleep breaks if you try to plug in or unplug power adapter and it freezes
As an idea to debug this, can someone catch a core dump in a crash kernel using kdump while hung? Maybe with a distro kernel with easily accessible debug symbols like the Fedora kernels?
Maybe we can get the state of where things are stuck then.
Not even sysrq commands seem to work when the laptop is frozen. Sysrq commands work fine via USB serial console before attempting to suspend but when the suspend attempt freezes the system, sysrq commands have no effect (not even reboot via sysrq b is possible). And this is with no_console_suspend.
Given that, I don't see how kdump could be triggered, or do you have a suggestion?
@juergbi I was able to get sysrq commands to work after waiting at least 10minutes since getting the freeze.
That said it was not reliable, I had to input REISUB at least two times, so it's hard to say at what point it starts responding.