Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Trying to use sleep on the Asus Ally fails most times and as you cycle the device a few times the built in USB gamepad and Asus N key devices go between working and not working. (The Asus N Key device are the extra keys the device has for the Armory Crate button etc.)
Attempt to put the device to sleep multiple times and observe the following:
The first time you attempt to put the device to sleep the screen will shut off with the fan running. When you press the power button again the screen will turn on with the gamepad and Asus specfic buttons no longer working.
When you do a sleep cycle again the device usually will go to sleep with the fan and light off. When you wake the device the gamepad and Asus keys will work again.
Other factors may alter the behavior of sleep as well and often requires a cold reboot. (holding power for 10 seconds)
Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
The IRQ put warnings you see are fixed in newer kernels. They're mostly harmless but I wanted to call them out. Please upgrade past 6.3.9 if you can. The latest 6.4.y perhaps?
2023-07-21 12:58:36,491 DEBUG: 2023-07-21T12:58:42,878818-05:00 amd_pmc AMDI0009:00: Last suspend in deepest state for 8313080us
So the good news is the APU got into the deepest state, so we don't have any driver problems .
2023-07-21 12:58:36,492 ERROR: ACPI BIOS errors found
The bad news is this ACPI BIOS error may be the cause for your problem. This is the part quoted in the logs:
2023-07-21 12:58:36,491 DEBUG: 2023-07-21T12:58:42,879929-05:00 ACPI: \_SB_.PEP_: _DSM function 8 evaluation successful2023-07-21 12:58:36,491 DEBUG: 2023-07-21T12:58:42,880247-05:00 ACPI: \_SB_.PEP_: _DSM function 6 evaluation successful2023-07-21 12:58:36,491 DEBUG: 2023-07-21T12:58:42,880875-05:00 ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SBRG.EC0.LID], AE_NOT_FOUND (20221020/psargs-330)2023-07-21 12:58:36,491 DEBUG: 2023-07-21T12:58:42,880880-05:00 ACPI Error: Aborting method \_SB.PEP._DSM due to previous error (AE_NOT_FOUND) (20221020/psparse-529)2023-07-21 12:58:36,491 DEBUG: 2023-07-21T12:58:42,880884-05:00 ACPI: \_SB_.PEP_: _DSM function 4 evaluation failed
This is possibly the reason that the fans don't turn off. The EC doesn't get notified properly due to the ACPI BIOS errors.
Can I please see a full acpidump? I'll see if I can make more sense of why that happened. But at least from the errors I see I suspect that this needs to be fixed by the BIOS.
I could upgrade to a newer kernel if it helps us here, but this will take a bit of work on my end to pull off because in order for this device to have working Wifi, Bluetooth, Asus Keys, etc I need to patch and build the kernel.
to have working Wifi, Bluetooth, Asus Keys, etc I need to patch and build the kernel.
I'm pretty surprised by this. I think the wifi/bluetooth should work in the latest 6.4.y stable. I suspect you need this patch to make it work, which is in the stable trees.
I submitted the patch to get Bluetooth working which was set for 6.5 and the WiFi had an issue where it would stop working requiring a hard reset to fix and if memory serves me correct sleep/resume would be a common cause for this. The other time was whenever you entered the BIOS before booting into the OS (The bios has cloud recovery so it initializes the chip.. probably related). The hid-asus patch has not been upstreamed yet.
I'm not at my computer right now to verify, but last I checked 6.4 using linux-fimware-git had working WiFi sometimes.
I believe it was this patch that fixed the issue. I could be wrong.
Yup; that's the exact same patch. If the UEFI network stack has run for any reason (as you outlined) then it has possibility to leave the card in a bad state. This has happened on multiple manufacturers.
That patch is backported to 6.4.4 and 6.1.39.
I believe the M000 call is a port 80 debug code, it won't matter if it's out of order.
The M460 call is a BIOS serial debug string, it also won't matter.
\_SB.PCI0.SBRG.EC0.CSEE (0xB7) is notifying the EC as part of function 3.
Notify (\_SB.I2CD.SPKR, 0xA1) // Device-Specific is notifying an amplifier (CSC3551) presumably to prevent pops or similar.
It looks like even though the order is wrong, functions 3 5 and 7 do work properly and probably aren't your issue.
On the way back up the order is 8->6->4, and again it's supposed to be 6->8->4.
The failing call is Notify (\_SB.PCI0.SBRG.EC0.LID, 0x80) // Status Change which is declared an external ACPI device in one the SSDTs: External (_SB_.PCI0.SBRG.EC0_.LID_, DeviceObj) but is never actually declared anywhere else.
It's a pure BIOS bug to reference an external LID in a handheld. There is no code that runs after this, so unfortunately I don't think it's likely the reason for your behavior.
By chance did everything work properly for the fan and such when you ran the s2idle report with automated wakeup? Or did it also have a problem?
I'm wondering if there might be a state machine tracking bug in their EC based on power button presses versus system initiated suspends?
I need to do more testing, but strangely enough I get different behavior when I boot ChimeraOS from the USB vs from the internal drive. At first I thought it was because of the SSDT override I did to remove the LID lines, but I removed the override and booted from the USB again and the same behavior happens.
So to clarify this is what I am seeing now.
I boot via USB and the gamepad/asus events never disappear when I sleep/wake the device
The system sometimes goes to sleep with a blinking light (seems almost like every 3rd cycle on average)
In both scenarios the fan is either spinning really slowly or it's not spinning at all.
I'll need to install ChimeraOS to the internal NVME again and run some tests because these results are completely different than before.
Also with multiple runs with DSDT overrides to set up the CSC3551 amp and without I did notice something happen once. It threw an error saying the NVME was not configured for S2idle.
2023-07-21 17:47:12,292 ERROR: ❌ NVME Micron Technology Inc is not configured for s2idle in BIOS
I did a fresh install of ChimeraOS onto my internal NVME and the issue where the keys would no longer work after sleeping is completely gone. I was running Windows for a little while and I think Asus pushed a new EC update automatically that fixed that issue. That great I suppose!
I'll have another user who hasn't ran Windows in a while boot into Windows and let it do all the updates to verify this.
So I guess the only issue that remains is the fact that half the time the device fails to sleep.
By chance did everything work properly for the fan and such when you ran the s2idle report with automated wakeup? Or did it also have a problem?
Apologies for the delay I needed to verify some things to be certain, but I ran the cycle 20 times and each time the system went to sleep successfully when using the s2idle script.
When I run echo mem | sudo tee /sys/power/state the device sleeps 100% of the time. Sometimes the firmware will fail to load for the Cirrus amp though meaning there is no sound. If I run systemctl suspend it acts the same way as pressing the power button manually.
[ 7.894231] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: Firmware: 400a4 vendor: 0x2 v0.43.1, 2 algorithms[ 7.895450] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: 0: ID cd v29.63.1 XM@94 YM@e[ 7.895459] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: 1: ID f20b v0.1.0 XM@176 YM@0[ 7.895465] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: spk-prot: C:\Users\dchunyi\Documents\Asus_ROG\Project\NR2301\Tuning\20221125\104317F3_221125_V1_A0.bin[ 7.979710] snd_hda_codec_realtek hdaudioC1D0: bound i2c-CSC3551:00-cs35l41-hda.0 (ops cs35l41_hda_comp_ops [snd_hda_scodec_cs35l41])[ 7.979994] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: DSP1: Firmware version: 3[ 7.979996] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: DSP1: cirrus/cs35l41-dsp1-spk-prot-104317f3.wmfw: Fri 27 Aug 2021 14:58:19 W. Europe Daylight TimeAudio fails to work when you see this[ 189.903009] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: DSP1: Firmware: 0 vendor: 0x0 v0.0.0, 0 algorithms[ 189.903024] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: DSP1: No algorithms[ 189.903030] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Cannot Initialize Firmware. Error: -22[ 189.903404] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: Firmware version: 3[ 189.903409] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: cirrus/cs35l41-dsp1-spk-prot-104317f3.wmfw: Fri 27 Aug 2021 14:58:19 W. Europe Daylight Time[ 190.383930] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: Firmware: 0 vendor: 0x0 v0.0.0, 0 algorithms[ 190.383960] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: No algorithms[ 190.383967] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Cannot Initialize Firmware. Error: -22
It appears that error is symlink related. I deleted the symlink and manually copied the file over as the proper name and it works. The last thing remaining seems to be related to pipewire. I need to let the audio stream go idle for about 20 seconds and then do something to trigger audio for it to work.
Not exactly. When I run echo mem | sudo tee /sys/power/state the system will sleep. When I run systemctl suspend it acts the same way as pressing the power button, meaning half the time it turns the display off with the fan running still.
As for the initial report I must of had already updated the EC before making it. I wasn't aware of the fact that there was an EC update (I don't think there are any mentions of it anywhere..just the 323 bios which I don't have) but I do know that there is a separate prompt that pops up that updates something firmware related occasionally on Windows. I'm still waiting on a confirmation from someone else that after applying whatever updates on Windows works for them.
I switched to Windows for a few days after using Linux since launch on the Ally to investigate the sleep behavior because I was getting mixed reports about whether or not it worked on Windows (it worked fine).
As for the symlink situation it almost looks like it's a timing issue or something. I'm not entirely sure. It was a shot in the dark to manually manage the file and to my surprise it worked out.
Another ChimeraOS dev checked their Ally which still has the gamepad/asus key issues when sleep/resuming and they are on EC 3.13 which was likely the version I had recently.
it acts the same way as pressing the power button, meaning half the time it turns the display off with the fan running still.
It won't be possible to capture with the s2idle debugging tool, but can I please see a regular dmesg cycle from specifically when this happened with dynamic debugging first turned on for the pinctrl-amd driver?
Another ChimeraOS dev checked their Ally which still has the gamepad/asus key issues when sleep/resuming and they are on EC 3.13 which was likely the version I had recently.
It won't be possible to capture with the s2idle debugging tool, but can I please see a regular dmesg cycle from specifically when this happened with dynamic debugging first turned on for the pinctrl-amd driver?
I'm pretty sure I didn't set up the dynamic debugging correctly, but here is a log that looks like it has more information than before. I was experimenting with modprobe configs and boot params to enable dyndbg for pinctrl-amd and I'm not convinced that any of those methods worked.
When pressing the power button I can get any or all of these messages in the dmesg. I don't seem to ever see them when I use the /sys/power/state method.
[ 198.765970] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Cannot Load/Unload firmware during Playback. Retrying...
[ 177.798057] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Wake failed, re-enter hibernate: -42[ 177.798257] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Wake failed, re-enter hibernate: -42[ 177.939660] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Wake failed, re-enter hibernate: -42[ 177.939861] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Wake failed, re-enter hibernate: -42[ 178.081087] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Wake failed, re-enter hibernate: -42[ 178.081286] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Wake failed, re-enter hibernate: -42[ 178.082999] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Timed out waking device[ 178.083579] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Timed out waking device
[ 133.311940] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Failed to set mailbox cmd 1 (status 0)[ 133.319018] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Failed to set mailbox cmd 1 (status 0)
So a quick update regarding the EC and the gamepad/asus keys problems. A user with the 323 bios and a newer EC than what I have (EC 3.16) has the problems still. So when they get the chance they will be downgrading the bios to 322 to match mine to see if the issue remains.
This is a bit frustrating but we're trying to figure out the true reason why the problem "disappeared" for me. I was thinking it's likely related to a configuration I had changed on Windows with Armory crate that worked around the issue. Hopefully with trial and error we'll find answers.
I still don't see anything in your logs for a power button press from the pinctrl-amd driver.
The most important message you should be looking for in your logs is:
[ 69.500476] amd_pmc AMDI0009:00: Last suspend didn't reach deepest state
I see that a few times, including again at the end of your logs
[ 1517.820258] amd_pmc AMDI0009:00: Last suspend didn't reach deepest state
What this means is that something is keeping the APU from going into the deepest state. What you can do is turn on dynamic debugging for the amd-pmc driver and look for what bits are active at suspend time. Hopefully just one bit is different and that will be a hint at what's actually different about your failures.
Would you mind moving up to 6.4.y? Like I said the WLAN patches are there now, so hopefully it's just porting your asus-wmi changes.
I'd like to make sure that we are looking at something that can still be fixed. 6.3.y is EOL.
But it does appear to me that on the failed attempt the big notable difference is that amplifier driver has some failures. So that may explain the problem.
[ 89.199996] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Failed to read MBOX STS: -121[ 89.200184] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Failed to read MBOX STS: -121.[ 93.132323] amd_pmc AMDI0009:00: Last suspend didn't reach deepest state
I suggest bringing this to the maintainers for that driver for comments.
I'm not ruling out the amp entirely, but that is a separate issue I'm tracking that occurs when sleep works or not and only sometimes. Let me do a few more sleep cycles to see if we can get more information.
Hmm...6.4.5 seems to be having way more issues than with 6.3.9. I'll need to do some more testing with this. Our new deployment wiped the "fix" I had for the firmware so the symlink is causing this problem again.
I'm not ruling out the amp entirely, but that is a separate issue I'm tracking that occurs when sleep works or not and only sometimes. Let me do a few more sleep cycles to see if we can get more information.
Unfortunately any component can keep a system from getting into the deepest state. So the amp does sound plausible to me.
Hmm...6.4.5 seems to be having way more issues than with 6.3.9. I'll need to do some more testing with this. Our new deployment wiped the "fix" I had for the firmware so the symlink is causing this problem again.
Feel free to CC me in any discussion with the amp driver folks. These amps are used in a lot of other laptops too, but this is the first time I've heard of problems like you're describing.
So I've tested this patch set and this resolves all the audio related issues I was seeing. The system still has an issue where sometimes it fails to go to sleep though.
I don't have any issues with the system freezing or audio not working when using the patches. I'll do more thorough testing tomorrow with it to see if letting the system sleep for longer than a few seconds changes anything.
I don't have any of the dynamic debugs enabled, but the dmesg looks to be the same as before minus the cs35l41 mbox and firmware loading issues. I'll poke around with this tomorrow as well to see if I find any more leads.
Also, there is still an unknown variable that changes the Asus Keyboard events that we've been having issues trying to replicate on purpose. There are been about five people who managed to get the Ally into the state where the Asus keys are always available. We've documented every single step we've made to get to this state, even if the notes wouldn't make sense, with no luck.
Any suggestions on how exactly we could troubleshoot this? I've been looking into dumping the EC, but the methods that work on other devices don't work on this. The system is using a BGA EC chip called the IT5125VG-192 which makes things complicated..
There are no Super I/O devices found when probing for 0x2E/0x4E.
Also, there is still an unknown variable that changes the Asus Keyboard events that we've been having issues trying to replicate on purpose. There are been about five people who managed to get the Ally into the state where the Asus keys are always available. We've documented every single step we've made to get to this state, even if the notes wouldn't make sense, with no luck.
Any suggestions on how exactly we could troubleshoot this? I've been looking into dumping the EC, but the methods that work on other devices don't work on this. The system is using a BGA EC chip called the IT5125VG-192 which makes things complicated..
There are no Super I/O devices found when probing for 0x2E/0x4E.
This to me likely points to a mistake in the WMI driver. I'd suggest decoding the MOF file and double checking everything matches the driver.
I'm not all to familiar with how to decode or manage MOF files, but I'll do some research and figure it out. At a glance it looks like the Asus WMI is defined in the DSDT and I'm finding many of the offsets listed in the asus-wmi.h header in the kernel. Some aren't there and I'm having to figure out what is what.
You can see what I'm looking at by searching for IIA0 in this .dsl file.
I was able to use bmf2mof to create this which has human readable text. If you attempt to use bmfdec on this new file it says invalid input? The GUID here actually matches what I see in the asus-wmi.c file in the kernel.
As I'm cross examining my information, I'm seeing SMIF (Sleep Mode Information?) is set to 0x04 by ANVI. Which if you look at this it tells you that this is indeed sleep mode.
Yup, that's totally the kind of thing I thought might be missing. If that's the case add suspend/resume callbacks to the asus-wmi driver to notify that kind of thing.
I'm hoping this is the case. I did an EC update that was available on Windows and now my Keyboard Events disappear again when suspending/waking the device. There are a total of 3 different cases.
State 1 Sleep/Suspend results in the keyboard disappearing forever until you reboot (Or cold boot)
State 2: Sleep/Suspend results in the keyboard disappearing every other cycle
State 3: Sleep/Suspend never has any issues. The keyboard is always available.
This has been consistent with multiple units.
I'll report back if I get back to state 3 while messing with the callbacks.
The issue seems to be directly related to the sleep states of the PCI XHCI adapter. I've spent a considerable amount of time going through the DSDT and issuing every variation of WMI ACPI calls available and even broke down the individual methods to know mostly what they do and there isn't anything related to do a WMI callback.
I've forced S3 by hacking the DSDT to add support and the N-Keys stay with sleep cycles when using "Platform" with `/sys/power/pm_test". If I don't use platform the system suspends and then the UEFI splash screen shows indefinitely after it wakes itself up.
I think there is a driver on Windows that handles the D3 Cold transitions correctly based on some of the documentation I have found.
I noticed there are quirks for the Surface tablet that can handle a PCI reset to cut the power to the wifi device to get it working, is it possible to do something similar here for testing?
\_SB_.PCI0.GP17.XHC0.RHUB.PRT3 is what the N-Key (Asus Keyboard) is connected to and the gamepad is connected to \_SB_.PCI0.GP17.XHC0.RHUB.PRT2 and this doesn't disappear when you cycle sleep/resume.
@superm1 The N-Key (Asus Keyboard) disappears when the system goes to sleep and the EC flush and GPE events are logged. During a sleep cycle where the N-Key remains, there are no logs indicating that this has happened. I was looking at the Steam Deck's DSDT and I noticed it has a lot of GPE wakeup notifications that the Ally does not have. As a matter of fact it looks like the GPEs on the Ally are a bit of a mess in general by comparison.
Keyboard disappears when logs show this.
[ 74.922792] ACPI: EC: ACPI EC GPE status set[ 74.922805] ACPI: EC: ACPI EC GPE dispatched[ 74.923473] ACPI: EC: ACPI EC work flushed[ 74.923475] ACPI: PM: Rearming ACPI SCI for wakeup[ 74.923567] ACPI: EC: ACPI EC GPE status set[ 74.923576] ACPI: PM: Rearming ACPI SCI for wakeup[ 75.460058] ACPI: PM: Wakeup unrelated to ACPI SCI
Keyboard pops up again when the log has this
[ 125.055022] ACPI: EC: interrupt blocked[ 125.182659] ACPI: \_SB_.PCI0.GP19.XHC2: LPI: Constraint not met; min power state:D3hot current power state:D0[ 128.139644] ACPI: PM: Wakeup unrelated to ACPI SCI[ 128.143115] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SBRG.EC0.LID], AE_NOT_FOUND (20230331/psargs-330)[ 128.143123] ACPI Error: Aborting method \_SB.PEP._DSM due to previous error (AE_NOT_FOUND) (20230331/psparse-529)[ 128.143149] clocksource: 'acpi_pm' wd_nsec: 0 wd_now: 568ae wd_last: 5e4130 mask: ffffff[ 128.143157] clocksource: Clocksource 'tsc' skewed 3060334370 ns (3060 ms) over watchdog 'acpi_pm' interval of 0 ns (0 ms)[ 128.143371] ACPI: EC: interrupt unblocked[ 128.143822] clocksource: Switched to clocksource acpi_pm
@superm1 I need to add some info to this: I requested some advice from some of the ASUS engineers I am in contact with and while I didn't get much information out of them I got this:
In FW 308, a new feature was introduced wherein the MCU would disconnect the USB when system screen off (the first setup when entering Modern Standby).
This was implement to address the issue caused by XINPUT not supporting USB selective suspend, which cause the system getting stuck and unable to enter Modern Standby.
However, the feature is not present in FW 305.
0x03 = suspend, 0x04 = resume. Following M000(arg) leads me down a rabbit hole of hex that looks like setting timers (and memory regions plus vars). \_SB.PCI0.SBRG.EC0.CSEE (arg) is the connection state of USB hub 0, 0xB7 = disconnect.
What I suspect is happening is that not enough time is being given for this disconnect process to finish. I think this because if we poke it manually in userland, then suspend the resume brings the hub back completely. And when I try a patch to asus-wmi driver to do this call plus a small msleep the devices return fine - and it must be done in resume_early to ensure the hub is active before other drivers need it (liek hid-asus).
I'm going to do a test of putting an msleep(2000) after the s2idle.c block:
/* Screen off */if(lps0_dsm_func_mask>0)acpi_sleep_run_lps0_dsm(acpi_s2idle_vendor_amd()?ACPI_LPS0_SCREEN_OFF_AMD:ACPI_LPS0_SCREEN_OFF,lps0_dsm_func_mask,lps0_dsm_guid);if(lps0_dsm_func_mask_microsoft>0)acpi_sleep_run_lps0_dsm(ACPI_LPS0_SCREEN_OFF,lps0_dsm_func_mask_microsoft,lps0_dsm_guid_microsoft);
quite likely Ally will still need the 0xB8 to CSEE func on early resume to ensure the hub is enabled early enough to prevent drivers and userland seeing a detach/attach event.
While I think I have solved the immediate issue by using prepare and resume_early in asus-wmi, I thought it prudent to write my findings here.
This is a great finding, and it certainly sounds plausible.
It's not the first time that we've seen bugs that "Linux is too fast" in the suspend sequence or resume sequence.
You can see an artificial delay is injected in amd-pmc driver for example on Cezanne. This is because Linux races with firmware. The proper fix would be in the firmware, but you never see the race on Windows so it's a tough case to make in fixing in firmware.
You'll notice that the methods for screen off and LPS0 entry and modern stand by entry don't really correspond well to the actions - Linux does all 3 back to back whereas in Windows they actually mean certain milestones in the suspend sequence. If ASUS actually expects a certain amount of time passes between them that definitely doesn't exist today in Linux.
I think your timing experiment will be enlightening but I don't think we can artificially slow it down for everyone without a spec to lean on. So I would ask if you could instead have one of the Asus drivers register an LPS0 hook for this case. If it finds this system then inject a delay into the process. You can again model how amd pmc does it, like I said that's exactly what it does.
I guess to add to my comment; is it timing between screen off and lps0 or is it timing between screen off command and actually suspending?
If it's the former then I think doing something in asus-wmi's PM ops callbacks is unfortunately the best bet.
If it's the latter then you should be able to register an LPS0 prepare() callback that just adds an msleep to the process. This should prevent the system from actually going into hardware sleep for that duration of time.
From what I can tell it's "between screen off command and actually suspending". If the testing so far is any indication.
I tried to find how to add a prepare() hook but couldn't, I could see only enough to do this in pci/quirks.c. Maybe I'm missing something?:
/* * ASUS ROG Ally */staticvoidasus_rog_usb0_connect_suspend(structpci_dev*dev){if(dmi_match(DMI_BOARD_NAME,"RC71L")){pci_info(dev,"ASUS ROG Ally found PCI quirk for suspend\n");/* sleep required to ensure USB0 is disabled before sleep continues */if(ACPI_FAILURE(acpi_execute_simple_method(NULL,"\\_SB.PCI0.SBRG.EC0.CSEE",0xB7)))pci_info(dev,"ASUS ROG Ally failed to set USB hub power off\n");elsemsleep(1000);}}staticvoidasus_rog_usb0_connect_resume_early(structpci_dev*dev){if(dmi_match(DMI_BOARD_NAME,"RC71L")){pci_info(dev,"ASUS ROG Ally found PCI quirk for resume\n");/* required to ensure USB0 is enabled before drivers notice */if(ACPI_FAILURE(acpi_execute_simple_method(NULL,"\\_SB.PCI0.SBRG.EC0.CSEE",0xB8)))pci_info(dev,"ASUS ROG Ally failed to set USB hub power on\n");elsemsleep(1000);}}DECLARE_PCI_FIXUP_SUSPEND(PCI_VENDOR_ID_AMD,0x15b9,asus_rog_usb0_connect_suspend);DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_AMD,0x15b9,asus_rog_usb0_connect_resume_early);
Sorry, I've done so much today I'm forgetting things.
Yes I created a PM with acpi_register_lps0_dev etc and prepare(), adding an msleep of various lengths. It unfortunately did not work. It seems like the pause needs to be directly after the screen off - and I'm not sure if there is a race with other things..
After this didn't work I tried the PCI thing above. The only thing that seems to work for us on Linux is making that same call in acpi but much early than where the screen-off makes it.
Yes I created a PM with acpi_register_lps0_dev etc and prepare(), adding an msleep of various lengths.
With the acpi_register_lps0_dev() and prepare() approach could you tell whether it ran before or after amd-pmc? Can you add a debugging statement to your prepare() callback to confirm?
The only thing that seems to work for us on Linux is making that same call in acpi but much early than where the screen-off makes it.
It would be nice to get confirmation what's actually happening on the other end of that ACPI call (if ASUS will share it). That could help explain the dependency on where the wait is injected. Or maybe it's possible to query a register or an ASL variable to confirm something happened and is finished for this case.
Method (ECAC, 0, NotSerialized) { MFUN = 0x30 SFUN = One LEN = 0x10 EROR = 0xFF CUNT = One While ((CUNT < 0x06)) { ISMI (0x9C) If ((EROR != Zero)) { CUNT += One } Else { Break } } }
and many other things to trace through.
So when the existing kernel patch calls this CSEE method it tries to do so very early in suspend with pm_op asus_hotk_prepare and early in resume with asus_hotk_resume_early. In these calls the msleep() is forced.
What I've found is the behaviour of the MCU can be heavily variant on the time length. We were at one point doing remove and bringback very early with a very short time to prevent devices getting lost but it was unreliable (300-600ms I think). It was then changed to 1500ms to let the devices fully detach (what the MCU does), then wait for reattach. It is not looking like this time is still not long enough.
The UUID above is ACPI_LPS0_DSM_UUID_MICROSOFT:
/* Microsoft platform agnostic UUID */#define ACPI_LPS0_DSM_UUID_MICROSOFT "11e00d56-ce64-47ce-837b-1f898f9aa461"