Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
On Ubuntu 22.04 amd64: If I suspend my laptop (HP Envy x360 with Ryzen 3700u) - by running systemctl suspend or closing the lid and leaving it, the screen does not come back on and I have to reset using SysReq - Alt+REISUB.
This is not a new issue, it's never worked over the last 3 years, on kernels going back to 4.x - before this I had an HP Envy with a Ryzen 2500u for some years, I don't recall this working then (there were more bugs then, which have since been fixed - this is the worst remaining bug).
Kernel:
Linux version 6.0.9-060009-generic (kernel@sita) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.2.0-9ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #202211161102 SMP PREEMPT_DYNAMIC Wed Nov 16 12:14:18 UTC 2022
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
Since the laptop automatically turns the screen off if I leave it I have to reset it multiple times a day to say it's frustrating is an understatement.
Can you please make sure your BIOS is fully up to date?
I'm not sure how good HP are at keeping up with AMD - I booted into Windows and tried the HP update utility, which said there are no updates.
I've built 6.1.20 - without the patch, to start with to establish a baseline.
In this state it's interesting - it wakes up 'more' than before, with a screen attached via USB-C, I can see the desktop but when I try and interact with it, everything flashes very fast.
Also interesting, the laptop screen will still show the login, while the monitor shows the desktop - see how they get out of sync in this video:
That's a weird problem, but for now let's fixate on the suspend issue for this bug.
Mar 24 12:44:12 orangered kernel: amd_pmc AMD0004:00: SMU debugging info not supported on this platform
Mar 24 12:44:12 orangered kernel: amd_pmc AMD0004:00: SMU cmd unknown. err: 0xfe
If you don't have an updated BIOS available, I suspect you won't be able to get into s0i3 even with the NVME patch I provided you. We either have a mistake in amd_pmc for Picasso or your BIOS doesn't have support. To get me some more verbose suspend related logging can you please get me the log from this script?
https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py
I looked at the Picasso spec and I think I found the problem. It doesn't support the SMU version command.
Have a try with this tree and use the script to capture a suspend run:
I'll try that next, the first time I ran the patched 6.1.20, it sort-of woke up after suspend (I was hoping to reproduce that and post a video and log, but haven't managed).
Both screens came back, I could move the cursor and type with the keyboard.
However, both were incredibly laggy. I ran glances and observed the CPU speed seemed to be correct, so something on a lower level was probably not right (I guess something was spamming events and using CPU up in that way).
On suspend everything seems pretty good - the fan turns off.
On wake, I can hit caps lock and see the light turn on and off (my test to see if things are really woken) - the monitors are all black though... looking at the end of the log a couple of things aren't happy, so it makes sense.
Thanks for the patience I'll do that now + report back.
I should disclose, I do have a small ACPI override - gleaned from here
#1230 (comment 580057)
To set the "Hardware Reduced (V5)" field to 0"
Original comment on the other ticket, mentions the a 13" laptop with, also HP Envy x360, with 3500u instead of 3700u:
Hey @awatry, I'm on an Envy x360 13, sporting a 3500U, similar issues, but it turns out these laptops do have deep S3 sleep support; but HP's ACPI tables don't announce it to the OS.
I ended up decompiling parts of the tables, editing, overwriting and recompiling.
run acpidump -o tables to create a file with the currently running tables
acpixtract -a tables to extract the individual tables (we only care about the FACP table)
iasl -d facp.dat to disassemble the FACP table
edit facp.dsl with your prefered text editor. set the "Oem Revision" field at the top to a larger number set the "Hardware Reduced (V5)" field to 0
iasl -sa facp.dsl to assemble your modified table
Make the following folder structure: payload/kernel/firmware/acpi/
IN the acpi folder add your compiled facp.aml
In the payload folder, run: find kernel | cpio -H newc --create > acpi_override.cpio.img
Move the resulting acpi_override.cpio.img to your /boot folder, then update grub so your initrd line looks something like this: initrd /boot/amd-ucode.img /boot/initramfs-linux.img /boot/acpi_override.cpio.img
Also change your Grub CMDLINE to force deep sleep as default for good measureGRUB_CMDLINE_LINUX_DEFAULT="quiet splash mem_sleep_default=deep"
Hopefully you should then see deep sleep when you run cat /sys/power/mem_sleep.
I ran the script and include the log and output, however it dies before suspending because the wakealarm device /sys/devices/pnp0/00:01/rtc/rtc0/wakealarm isn't present (see the output).
❯ sudo python3 bin/amd_s2idle.py
[sudo] password for stu:
Location of log file (default s2idle_report-2023-03-27.txt)?
Debugging script for s2idle on AMD systems
💻 HP HP ENVY x360 Convertible 15-ds0xxx (103C_5335KV HP Envy) running BIOS 15.24 (F.24) released 04/19/2022
🐧 Ubuntu 22.04.2 LTS
🐧 Kernel 6.1.21+
🔋 Battery BAT0 (313-54-41-A SA04055XL) is operating at 100.00% of design
Checking prerequisites for s2idle
✅ Logs are provided via systemd
✅ AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx (family 17 model 18)
✅ LPS0 _DSM enabled
✅ ACPI FADT supports Low-power S0 idle
✅ HSMP driver `amd_hsmp` not detected (blocked: False)
✅ PMC driver `amd_pmc` loaded
✅ GPU driver `amdgpu` available
✅ System is configured for s2idle
✅ NVME Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 is configured for s2idle in BIOS
✅ GPIO driver `pinctrl_amd` available
How long should suspend cycles last in seconds (default 10)?
How long to wait in between suspend cycles in seconds (default 4)?
How many suspend cycles to run (default 1)?
Started at 2023-03-27 10:22:46.246803 (cycle finish expected @ 2023-03-27 10:23:00.246819)
Traceback (most recent call last):
File "/home/stu/bin/amd_s2idle.py", line 1882, in <module>
app.test_suspend(duration=duration, wait=wait, count=count)
File "/home/stu/bin/amd_s2idle.py", line 1718, in test_suspend
with open(wakealarm, "w") as w:
PermissionError: [Errno 13] Permission denied: '/sys/devices/pnp0/00:01/rtc/rtc0/wakealarm'
This time when it tried to resume it displayed a glitched version of the login screen (something it does roughly every 4th failed resume) I forgot to grab a pic, will do next time I see one though.
The wakealarm thing is a distraction from the GPU bits, I can edit the script so it prompts me to suspend instead then manually attempt a wakeup 10 seconds afterwards.
I found why this is happening. I've force pushed a new set of patches to the branch (sha 080eeecf05). Can you please pull the branch and try again?
The wakealarm thing is a distraction from the GPU bits, I can edit the script so it prompts me to suspend instead then manually attempt a wakeup 10 seconds afterwards.
Not having a working wakealarm is certainly surprising. Sure if you can modify the script that would allow it to gather more debugging data.
Pulled + Rebuilding now, so cutting back those options, I think I end up with:
quiet splash initrd=/boot/initrd.img amd_pmc.dyndbg=+p
Yup.
ACPI-wise I was only setting "Hardware reduced" to 0 - there were various issues before but this may not be needed now.
Hardware reduced platforms I expect at least used to have a hard time waking up from s0i3. But with changes in GPIO driver I expect this to be working again. If it's not, that might be our starting place.
OK so the amd_pmc and nvme stuff looks good. I'll run that by some people internally and send those out once I have a thumbs up.
Regarding the kfd failure, @fxkuehl any thoughts? We tried to stop masking the error code out of kfd_iommu_resume and it leads to an -EBUSY result from iommu init.
The IOMMU initialization problem only affected older APUs. This log shows a Raven, which didn't have this IOMMU init issue. So it is an unrelated problem.
This is likely to work around the problem. I'm not sure what you'd learn from that. You already know that IOMMU is failing to initialize in a KFD-specific code path. Another possible workaround is to disable the IOMMU. That should keep KFD working on Raven with a fallback that doesn't use the IOMMU.
As it stands today KFD resuming I believe means that the rest of the GPU resume path fails. I want to confirm if that's why we're ending up in GPU recovery path that also fails or if KFD IOMMU resuming is a victim of another problem.
Another possible workaround is to disable the IOMMU.
IE amd_iommu=off on the kernel command line, yeah I think that's a better workaround if it works.
I suspended, waited around 10 seconds and then resumed by pressing a key on the keyboard.
The screen remains black, though it does briefly change (this is the same as all the other times on waking up, there is an indication that something happens, but not actual output - I guess I should see if it's the same on HDMI as on USB C).
On the wakeup to black screen, I tried hitting caps lock to see if the system is running enough to control the caps lock light, and this time it didn't toggle (unlike the previous logs with the default iommu settings, where hitting caps lock toggles the capslock light).
I was able to reset the computer with SysReq-REISUB,
Only the standard logs here - the script refused to run, with a message that system doesn't meet the prerequisites for s2idle.
EDIT: ran again to confirm symptoms and uploaded new log -
Yes, suspend + resume (via keyboard or mouse) appears to be working properly.
I haven't stared at the logs yet, to see quite how properly.
Here are some logs - I to patch amd_s2idle.py again, since reading
/sys/kernel/debug/amd_pmc/smu_fw_info causes an "Invalid parameter" error (in python or the shell under this config).
With amd_iommu=off, I see that the IOMMU is still enabled but IOMMUv2 is disabled.
> Mar 31 11:27:53 orangered kernel: iommu: Default domain type: Translated > Mar 31 11:27:53 orangered kernel: iommu: DMA domain TLB invalidation policy: lazy mode > ...> Mar 31 11:27:54 orangered kernel: AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
KFD was not enabled at all. I thought we'd be able to fall back to a mode that doesn't use IOMMU, but I was wrong:
> Mar 31 11:27:58 orangered kernel: kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?> Mar 31 11:27:58 orangered kernel: kfd kfd: amdgpu: Error initializing iommuv2> Mar 31 11:27:58 orangered kernel: kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors
That still doesn't explain why suspend/resume fails in this case. The log did not capture anything from resume at all. The last messages I see are from suspend:
> Mar 31 11:29:48 orangered kernel: PM: suspend entry (s2idle)> Mar 31 11:29:48 orangered kernel: Filesystems sync: 0.010 seconds
I noticed that - I can probably try again in a bit over an hour if confirmation is needed.
Just trying to wake it from the keyboard didn't work, though the hardware backlight on the laptops keyboard does activate + enough of Linux is up that Alt+Sysreq+REISUB does restart it.
Yes, suspend + resume (via keyboard or mouse) appears to be working properly. I haven't stared at the logs yet, to see quite how properly.
Yeah it looks good to me. I guess you have a viable workaround for now with those patches and CONFIG_HSA_AMD turned off. I shared the PMC ones with the PMC driver owner and he's going to batch submit them with some other stuff going out next week or so.
The NVME one I'll send out today.
The log did not capture anything from resume at all. The last messages I see are from suspend:
Maybe the kernel might have paniced? Without a serial console I don't see a way to get info out. Maybe a TTY on an A-A XHCI debug cable could help, not sure if that would work in this context.
I've rebased by mlimonci/picasso-fixups branch on 6.1.22 and added some more verbose logging around the the IOMMU driver in the handling of dev_state. Can you please update to this, re-enable CONFIG_HSA_AMD and capture a new log? Hopefully it will make it clearer what's going on.
cd690562d7b4 XXX: Add some more verbose logging around -EBUSY cases in IOMMU driver
OK, that's really helpful. I see an an incongruity between suspend and resume that stands out. In suspend under amdgpu_device_suspend the suspend routine for KFD is skipped in s0ix. So it never frees the device.
In resume there is no similar check and IOMMU resume happens no matter what (see amdgpu_device_ip_resume and all callers down it's chain).
I've added another commit to the branch that should skip the resume for kgd2kfd_resume_iommu in the s0ix case. Can you see if that helps (and provide a log even if it does help)?
If it doesn't help I have another idea too, but I'd like to see this first.
No probs, here are logs for that version. There's no change on the user visible side of things (screen is black, keyboard backlight doesn't illuminate.
I should really have made a branch at the point that 6.1.21+ was working with AMD_HSA=n, but never mind.
At the very least, I'll try and go back one patch and see if that makes a difference.
One thing I noticed with that before the rebase, on waking up the laptop keyboard illuminated on attempting to wake, but afterwards it does not (I can't recall if the HSA flag makes a difference there).
(Edit: I'd erroneously written above in one of the ticket updates that it turned on it one of the updates after the rebase, when it did not).
OK, I guess I do need the patches to successfully wake up, I tried a building the kernel from a few different points, 6.1.22, before the KFD patches and 6.1.21.
I'll see if I can get back to the version that worked the AMDHSA=n, patched then go forward until the point it stops working.
Fortunately I mentioned the commit hash above so I have the 6.1.21 branch still.
Here is the 6.1.21 branch with just amd-pmc/nvme changes. I expect you can get this working with CONFIG_AMD_HSA turned off (since you did before).
Also - I suggest you use CONFIG_LOCALVERSION_AUTO in your kernel config. It will include the hash string in the package so you should be able to track it more easily (and we can better match results and code to a given log).
A quick non-update: apologies for the slowdown, schools are off for easter so it's harder to find the time with a small child around the house.
I'm scripting some of the manual log collection, so it should be more convenient + less error prone (e.g. getting an audible prompt so the machine can reset without resorting to REISUB when it's just the screen is black) + gathering logs of previous boots + of course easier to get a lot more logs.
Definitely haven't forgotten, as just using this laptop is a daily reminder.
If Aarons patch doesn't help, then I'll go back up this thread and check which patches we applied to get to stop the black screen + if they are in the branch.
I've also ordered a USB debug cable - it will take a while to arrive + then need to find the time to get it working too :)
If Aarons patch doesn't help, then I'll go back up this thread and check which patches we applied to get to stop the black screen + if they are in the branch.
I've reconstructed the branch with 6.1.21 and the amd-pmc patches landing upstream for 6.4, nvme patch landing upstream for 6.4 and Aaron's patch.
Thanks, the good news, is that it seems to work using amd_s2idle to suspend and unsuspend, with HSA_AMD off: the config has: # CONFIG_HSA_AMD is not set
I've included some logs from suspending / unsuspending a couple of times in this state -
amd_s2idle.log
It's felt a little inconsistent testing this, but I just realised that while amd_s2idle seems to unsuspend OK, systemctrl suspend has been consistently not coming back up from suspend.
I'll grab a log from that, though it may not be interesting as it'll probably stop at the point of the suspend.
Thanks, the good news, is that it seems to work using amd_s2idle to suspend and unsuspend, with HSA_AMD off: the config has: # CONFIG_HSA_AMD is not set
OK so with the patch that Aaron posted (which is in that branch), you should hopefully be able to change nothing else but enable HSA_AMD now and get the same success.
It's felt a little inconsistent testing this, but I just realised that while amd_s2idle seems to unsuspend OK, systemctrl suspend has been consistently not coming back up from suspend.
OK... that's quite odd. Can you please check what you have in /lib/systemd/system-sleep? Maybe one of your hooks is causing that problem.
My USB debug cable "EXCLUWOR Windbg" arrived, no idea if it's in spec, I am updating older laptop in the meantime in case it comes in useful, can either run Ubuntu on the other machine or Windows + Putty.
I'd had a feeling something inconsistent was happening, and managed to catch the following log:
I made it suspeded with systemctl suspend, and pressed a button, and nothing happened.
Then I continually tried connecting via ssh. Eventually I did get a connection. If I pressed keys on the keyboard, the screen wouldn't come on apart from a new variant of glitchiness showing some grey boxes.
I was able to do this a couple of times, but haven't managed to reproduce it yet after a few tries.
Interesting - I hadn't tried doing s2idle beyond about 10 tries, but mostly that's because the wakeup device doesn't exist, so it's a pain to manually count out 10 seconds then hit a button !
It's good to know that inconsistency is possible on the system, and not just something I'm doing.
I found something interesting - when the laptop is connected to the monitor (and powered) via USBC, it almost always resumes.
[This is the config I have been mostly testing with]
In other configs, it mostly does not (the default PSU is a barrel plug).
[This is how I use the laptop the rest of the time]
I'll get some logs of different combinations, if there is somewhere in /proc or /sys that can tell if power is coming from the external PSU/USBC/Battery it might be handy for scripting as it's a few logs to grab.
So can you confirm then that CONFIG_HSA_AMD being enabled works with the latest branch too? That would confirm that we've at least got all the right code for the flow in place save your issue where different power plugs are leading to different behavior.
Ah... so I was going to write about how I'd been booting into Windows occasionally and using HPs update tool and how strange it is that the update isn't on there (which is all the case).
Thanks - it's the correct model, I ran the bios update, and now have a fun issue to sort out.
While the keyboard works before grub starts (I can enter the bios or change the boot order).
I don't know if there was a second part the bios update expected to boot into (there are various EFI files in HP/Bios) but won't look at that too much at the moment.
Is it hanging at the kernel? Or is the keyboard just not working? Try using a USB one. If the internal keyboard isn't working, I do have an idea but I'll need to see your acpidump after the BIOS update to confirm it.
That allowed booting windows and getting into GRUB (but WITH keyboard working).
I wasn't able to boot Linux from there, but noticed the other ticket mentioned turning secure boot off.
I turned off secure boot, and was able to do everything again, but actually boot into Linux this time.
Looks like it went up to F26 (see dmidecode output below).
Bit of a fun distraction, I'll stick with F26 + secure boot off while debugging this issue, but will probably try F24 and F25 in the next few days and see if that issue reverts.
Handle 0x0000, DMI type 0, 26 bytesBIOS Information Vendor: AMI Version: F.26 Release Date: 10/28/2022 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 16 MB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported Smart battery is supported BIOS boot specification is supported Function key-initiated network boot is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 15.26 Firmware Revision: 40.36
Anecdotally things seem better, in that - most of the time I come back to the laptop it wakes up. Before: I wasn't measuring how many times it failed to resume, as it was most of the time - but now it doesn't happen often, but maybe it could be that I'm just leaving it on battery.
OK, with HSA_AMD enabled on 6.1.22-06017-gcd690562d7b4 on F.26, all power combinations come back to a black screen.
There is a slight difference: I have a sound that plays when a test completes (the idea is I can press a key to log the result), without HSA_AMD enabled I didn't hear the sound, and with it I do.
It doesn't really matter since the keyboard doesn't work at the black screen so I can't actually press a key to log a result.
Here are the logs with HSA_AMD enabled on 6.1.22-06017-gcd690562d7b4 on F.26 -
I suspect from above result it was the wrong branch used.
I applied the S0ix patch, and re-tested.
OK! Well then we have good confirmation it helps. All of those patches we came up with for this bug are queued up for 6.4 now.
The remaining issue is tied specifically to AC based wakeup not returning. Remaining things we can look at:
Did you get your XHCI debug cable working? Could you get a log at that time of suspend/resume at all?
Can you please give an acpidump? I'm really surprised hardware this old is using the Microsoft GUID for uPEP _DSM. I want to look for a possible mismatch (which is something I might have found in another issue I'm tracking). If there is a possible mismatch like I'm suspecting then this patch might help: https://bugzilla.kernel.org/attachment.cgi?id=304132&action=diff
Note: To set expectations unfortunately this is very likely a firmware issue that the manufacturer will need to look at using specialized debug hardware.