Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Lenovo Yoga Slim 7 Gen 8 14APU8 with AMD Ryzen 7 7840S hangs when trying to hybernate after suspend-resume cycle
There are some regressions on hibernation noticed in the amdgpu driver, but that issue is not result of this regressions. I still have it with the latest 6.6.6 kernel + fixes from #2812 (closed), and on the -next from December 13 2023, where the fixes landed and the hibernation regression revert is applied. The fixes are required to achieve s2idle state - otherwise the laptop does not sleep and wakes up immediately.
The way system works. After boot I can succesfully hibernate, and resume from hibernation. I can do it several times - no problem. But if I do suspend to s2idle and resume - and try to hibernate - I get black screen , loud fans and no messages. I've tried to debug - the freeze is on device stage. More than that - it freezes on AMDGPU. To test this I've modifies the driver code to set in_s0ix variable to always "true" so amdgpu behave like it is going to suspend not hibernate and skip some steps. With this hack hibernation happens - devices freeze, image is successfully written on disk. Of course I can not resume from this image - screen resumes full of garbage , but at least that indicates that the issue is somewhere in amdgpu.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
lets wait for rc6 then. I am afraid that is different issue. The issue resolved is about not able to hibernate at all. This problem is after suspend cycle the system lost ability ho hibernate. rc-6 is soon - so I will update the issue if it is gone or not.
Ah, it might be a different issue then. Anything in the journal from these attempts? Or the systemd pstore from a NULL pointer dereference or anything?
Unfortunately nothing. It hangs when suspending devices. I've managed to make 6.6.6 to hibernate - but it does not resume from that image. To do so I 've patched 2 files
amdgpu_smu.c in function static int smu_set_gfx_imu_enable(struct smu_context *smu)
replaced if (amdgpu_in_reset(smu->adev) || adev->in_s0ix) with if (amdgpu_in_reset(smu->adev) || (adev->in_suspend)
And amdgpu_device.c in amdgpu_device_ip_suspend_phase2 function there is section that exludes PSP, GFX and MES from action when doing s0ix. I've added section excluding just PSP for other suspend types - and it startsi hbernating then. It resumes and even restores the picture, and moves the cursor - but that's all it does and it hangs completely after 10-15 seconds.
So it seems that this thing have something to do with PSP. But I still can not inderstand what makes the difference for hibernating the system after fresh boot (and after resume from hibernation) and hibernating after s0idle
This patch alone kills hibernation even the first time , before s0idle cycle. Same - screen black, laptop on and fans spinning.
But if I add this patch as well - it hibernates, but resume to unusable state. First look perfect - screen is back, cursor moving but keyboard dead, and in some seconds screen goes black, then shows picture but mouse is not moving this time.
And it also breaks hibernation - but strange thin is thah with this patch I get exactly same behaviour during trying to hibernate as I get after resume with the previous one. So seems that the key is in calling this function
return smu_set_gfx_power_up_by_imu(smu);
In order to hibernate I need to avoid calling it if the adev->in_suspend. But in this case I resume with non-fucntional amdgpu. But it I skip calling this function with logic (in_suspend && !(in_s4)) - the system aborts hibernation and gives me same non-functional amdgpu immediately. First several seconds I can move the mouse, then black screen for couple sec and dead hang. If I skip calling it with logic in_s0ix || in_s4 - then it refuses to hibernate normally - so screen goes black fans working nothing more happening.
All the tests I was doing without s0ix cycle, and with your patch applied.
done sevral tests again - PSP was a wrong assumption, I 've reverted that patch and have only my smu patch 3 applied. And it does all the job. With the patch hibernate_fix3.patch applied the system works the following
Hibernate without s0ix cycle - it starts hibernating, screen goes of, then on with mouse moving but nothing else functioning, black again, on again but even mouse is dead this time (exactly as bad resume in case 2)
Hibernate after s0ix cycle - starts hibernating, screen off, screen on (normal behavior) - then hibernates properly. When resume - reads the image, wakes up with the screen right as it was while hibernating, mouse moving everything else dead, that screen off, screen on, dead freeze.
I have tried to ignore return value of smu_set_gfx_power_up_by_imu(smu) - call the function but return 0 from the calling function, that did not help - so seems that the issue happening within smu_set_gfx_power_up_by_imu itself.
@mark.herbert42 I don't think it's a correct flow to avoid IMU power up. The only thing that comes to mind is that maybe we have a mutex protection issue.
I just tried on an Phoenix laptop I have on hand (Framework 13") with the following patch added and I can't reproduce. I did hibernate; s2idle; hibernate and all 3 cycles came back correctly.
FWIW I tested on top of amd-staging-drm-next.
For sure avoid IMU power up is a bad idea as it does not help. It allows to hibernate - but kills resume. So it just indicates the place where hang happens more or less.
Also it can be a specific issue related to this Yoga firmware which is terribly buggy. Before suspend hibernation works - so can be that resume from suspend is missing some of the steps and leave the hardware in a different state that after the fresh boot. And that may not be the case with Framework.
We can't skip IMU and PSP for hibernate as we need them to power up the GFX and reload the firmware when hibernate resume.
Hi @mark.herbert42, can we get some error from driver when hit this issue? If not, can we try to run hibernate test that only test the freezing of processes and suspending of devices. and check whether it works.
echo devices > /sys/power/pm_testecho platform > /sys/power/diskecho disk > /sys/power/state
Hi Tim, the issue is when I try devices - I never get my laptop control back. There is probably some trick to get the errors stored and available after I do hard power off - as that's the only way I can see something but not black screen. But here I need some instructions from you how to store errors and how to find them. And maybe some extra debugging variables should be set as well
Thats the journalctl results when I first do s2idle and then try ho test-hibernate. By the timestamp it should happen at 10:20 but logs abort there and leave no single mention about trying to hibernate.
Here is the dmesg if I try to test-hibernate before going to s2idle - a lot of messages and no hangs.log_before_s2idle.txt
I believe there is no difference of normal hibernate or devices test hibernate - it just hangs from the beginning silently. Unless I remove this IMU step. Is there a way to switch on extra debugging info from amdgpu driver - maybe that will give some more details. Maybe some kernel variables need to be activated or even compile kernel with different config. Waiting for the instructions.
Adjust the drm debug msg level can get more logs, like echo 0x1ff > /sys/module/drm/parameters/debug. But if these two commands can't get pm logs output to the kernel log, then I think the amdgpu driver also can't output more logs.
hibernate_test_debug1.txt
There are some amd debug messages before the hibernate entering - but still nothing that can show what's happening.
I will apply my IMU hack so the thing will not get stuck in the process. It is not a solution as the laptop can not resume, but at least it will show some messages. Maybe after resume from s2idle the driver somehow miss some steps (or do extra? ) that kills the IMU logic.
Here the log with hack and all debug options.
System does not hang but become (expectedly) ususable. But the hack allows to proceed further and maybe that infor will give some clue