s2idle fails when battery power on 5800U

UPDATE!

I discovered some quite useful piece of information. Ok I start to think what are the differences between plugged in (AC power) or not (battery power), there are several actions that are taken when connected to AC. I have used powertop to see what change. In particular, when running with battery power, the following services are active:

   Good         Runtime PM for I2C Adapter i2c-4 (AMDGPU DM i2c hw bus 0)                                              
   Good         Runtime PM for I2C Adapter i2c-5 (AMDGPU DM i2c hw bus 1)
   Good         Runtime PM for I2C Adapter i2c-3 (SMBus PIIX4 adapter port 1 at ff20)
   Good         Runtime PM for I2C Adapter i2c-1 (SMBus PIIX4 adapter port 0 at ff00)
   Good         Runtime PM for I2C Adapter i2c-6 (AMDGPU DM i2c hw bus 2)
   Good         Runtime PM for I2C Adapter i2c-2 (SMBus PIIX4 adapter port 2 at ff00)

(Good means power savings is enabled)

When laptop runs with AC:

   Bad           Runtime PM for I2C Adapter i2c-4 (AMDGPU DM i2c hw bus 0)                                              
   Bad           Runtime PM for I2C Adapter i2c-5 (AMDGPU DM i2c hw bus 1)
   Bad           Runtime PM for I2C Adapter i2c-3 (SMBus PIIX4 adapter port 1 at ff20)
   Bad           Runtime PM for I2C Adapter i2c-1 (SMBus PIIX4 adapter port 0 at ff00)
   Bad           Runtime PM for I2C Adapter i2c-6 (AMDGPU DM i2c hw bus 2)
   Bad           Runtime PM for I2C Adapter i2c-2 (SMBus PIIX4 adapter port 2 at ff00)

(bad means power savings is disabled)

So:

/sys/bus/i2c/devices/i2c-3/device/power/control controls the SMBus PIIX4 adapter ports (all 3)
/sys/bus/i2c/devices/i2c-4/device/power/control controls the AMDGPU DM i2c bus (all 3)

I then tried to manually deactivate the 'auto' power savings for those devices and put the computer to sleep with this script:

echo 'on' > '/sys/bus/i2c/devices/i2c-3/device/power/control'
echo 'on' > '/sys/bus/i2c/devices/i2c-4/device/power/control'
echo freeze > /sys/power/state

Success! Laptop goes to sleeps and resume successfully even when running with battery power!

It has to be related to those devices since they are in power save, they do not respond (in time) to the request to sleep, thus the timeout.

mentioned in issue #1800 (closed)

New update

Tested few times to enter the suspend with the script previously mentioned, and it eventually fail with those settings enabled too. From the logs, it seems the problem is still a timeout with the SMU:

Nov 17 21:22:04 princeton kernel: [ 3077.357570] amd_pmc AMDI0005:00: SMU response timed out
Nov 17 21:22:04 princeton kernel: [ 3079.398534] amd_pmc AMDI0005:00: SMU response timed out
Nov 17 21:22:04 princeton kernel: [ 3079.398539] amd_pmc AMDI0005:00: suspend failed
Nov 17 21:22:04 princeton kernel: [ 3079.398541] PM: dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110
N

added s0ix label

Your first failure is a little bit different; it shows a warning here: https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/tree/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c?h=oem-5.14-next#n2012

Someone from display side might need to comment on the context for how this warning happens and if it's something we need to worry about. @siqueira can you look at that?

The SMU response timed out on battery I'm still surprised to see happens. As an experiment - could you recompile the kernel with a patch to change the timeout from 2 seconds to 10 seconds? The tree for the kernel you're running is here: https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/log/?h=oem-5.14-next The file to change is drivers/platform/x86/amd-pmc.c. Change 20000 here: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd-pmc.c#L80 to 100000.

See if that helps out for your system, or any other behavior changes.

@lijo I was thinking about this type of failure, do you think changing amdgpu_get_power_dpm_force_performance_level to another value might help as a workaround? Or any other sysfs that could change behavior?

amdgpu_*_power_dpm_force_performance_level is mainly for debugging and profiling. We shouldn't need to change it for any regular use.

Right, I was hoping specifically for debugging purpose if it showed different behavior it could be a data point to come up with a real solution.

So I managed to recompile the kernel (5.14.0) with the increased timeout:

-#define RESPONSE_REGISTER_LOOP_MAX     200
+#define RESPONSE_REGISTER_LOOP_MAX     100000

Unfortunately the results are exactly the same:

Nov 22 22:50:26 princeton kernel: [  195.291168] amd_pmc AMDI0005:00: SMU response timed out
Nov 22 22:50:26 princeton kernel: [  205.291278] amd_pmc AMDI0005:00: failed to talk to SMU

Then I started to instrument the code (add some dev_info calls to see if I can spot something), and eventually I have found something: it appears the poll time to the readx_poll_timeout is too high (PMC_MSG_DELAY_MIN_US).

The value is normally set to 100 uSec, but when I change it to 50 uSec, everything works fine: the machine goes to suspend immediately even when running with battery. I ran it 5 times (suspend/resume) and I did not get a single failure/timeout.

This sleep time is used in the read_poll_timeout macro (include/linux/iopoll.h) that is calling usleep_range(tMin, tMax).

May be by reducing PMC_MSG_DELAY_MIN_US to 50 uSec, the read_poll_timeout spins faster and it doesn't miss the given condition (that terminates the spin loop)? Can it be the reached condition (that stops the spin loop) is cleared automatically ? Can it be related to power management of the core? So when running with battery CPU is slower and somehow misses the condition change?

The important thing is that... it works!

This is a very interesting result! Please continue to monitor and see if it really fixes things. If so would you mind sending a patch up to the mailing lists?

One possibility is - if the device already entered D3hot state, memory mapped register access requests won't work. Checked amdgpu, there we have a tight loop for polling with a delay of 1us between each read. It seems better to have min as 1usec and max as 2s explicitly in that code.

@lijo so would you recommend to lower the poll delay all the way down to 2 (usec) ? So the usleep_range will end up using min=1, max=2 uSec?

So far I've been putting my machine to sleep with battery power and I have not encountered a single failure. I have noticed though that after my change entering the sleep state takes a bit longer (I'm talking perhaps 1 second at the most) compared to the original code (powered with AC).

Let me know what you guys think it's best and I'll submit a patch.

I think he was meaning lower end of 1uS and upper end the same (2s).

Hmm... well I really meant 2 uSec, not 2 sec... but I am talking as the average delay between read, the 2 seconds it can still be the upper timeout after which read_poll_timeout will report a timeout error.

The read_poll_timeout only takes one argument: the max timeout, and this value (according to the doc) should be less than 20ms. This value is used as sleep in:

			usleep_range((__sleep_us >> 2) + 1, __sleep_us);

But, yes, at the end my question is: shall I submit the patch with whatever I have now:

#define PMC_MSG_DELAY_MIN_US		50
#define RESPONSE_REGISTER_LOOP_MAX	20000

Or should I reduce the value of PMC_MSG_DELAY_MIN_US down to 1?

I can only test on my machine, it would be interesting to see if somebody else with a similar problem can run some tests...

One possibility is - if the device already entered D3hot state, memory mapped register access requests won't work

I think adding pm_debug_messages will add another data point if this is the right root cause. Should see the order of events whether it was already in D3hot at this time. Still if this is the root cause - I think the proper solution is to delay the D3hot call to "after" rather than mucking with timeouts more because it's still inherently racy.

What is the best way to submit a patch for this problem? Shall I go through the process described here: https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html

Yeah it's the standard process indicated there. I think if we confirm D3hot timing was not the root cause then go ahead and send up a patch. Otherwise if it is, we might need to think about changing it so the PCI device goes into D3hot after the suspend function is called instead of before.

Is this something I can do on my end? Do you want me to try something on this machine (since the problem is reproducible) ? Let me know.

Yeah you can do it. Add pm_debug_messages to kernel command line or change at runtime in /sys/power/pm_debug_messages. Add the new dmesg log while the failure happens. https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

This log was captured using kernel 5.14.0-1007-oem (Ubuntu) with kernel args: pm_debug_messages and initcall_debug: fail_with_extra_debug.log

This is the same log as above but with only pm_debug_messages (less noise probably), using same kernel: fail_with_pm_debug_messages.log

This log was captured using my modified version (kernel 5.14.0+, from vanilla main stream Linux repo), with pm_debug_messages. You will also see some FABDEBUG> messages where I have instrumented the driver: success-mod.log

Not much info. Would it possible to enable debug messages on pci core files - pci.c and pci-acpi.c?

Dynamic debug is one way. Example as in https://www.kernel.org/doc/html/v5.15/admin-guide/dynamic-debug-howto.html

dyndbg="file ec.c +p"

Hi @lijo this is the log of the failure captured with dynamic debug enabled for pci.c and pci-acpi.c. dyndbg.log

Sorry to bother, could you enable dynamic debug for this as well - drivers/acpi/device_pm.c?

Could you post the whole log? Actually wanted to check the pci debug messages before these with respect to amdgpu device.

Nov 25 09:10:07 princeton kernel: [  155.153609] amd_pmc AMDI0005:00: SMU response timed out
Nov 25 09:10:07 princeton kernel: [  157.153629] amd_pmc AMDI0005:00: failed to talk to SMU
Nov 25 09:10:07 princeton kernel: [  157.153633] amd_pmc AMDI0005:00: suspend failed

Edit : Never mind, seems there is nothing after
Nov 25 09:10:03 princeton kernel: [ 152.864660] PM: suspend entry (s2idle)

I was expecting it to go here https://elixir.bootlin.com/linux/latest/source/drivers/pci/pci-acpi.c#L1011 or https://elixir.bootlin.com/linux/latest/source/drivers/pci/pci.c#L1045

ok, I have captured the logs with dynamic debug enabled for those 3 files:

echo "file pci-acpi.c +p; file pci.c +p; file device_pm.c +p" > /proc/dynamic_debug/control

The first time I put the machine to sleep, it worked (even with the vanilla Ubuntu 5.14.0-1007-oem kernel: full_success.log
The second time failed as usual (same kernel): full_fail.log

From the logs it seems like D3hot is not the problem here. The suspend_noirq of gfx device (generic pci suspend_noirq) is run after pmc's suspend noirq.

Also double checking amd-pmc driver uses root complex to communicate not amdgpu device: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd-pmc.c#L74.

Any suggestions guys at this point? I am still willing to instrument/debug this issue if we can get a more robust resolution.

I think sending up your proposed patch for review makes sense as we know it's not D3hot.

Ok I have sent the patch to the platform mailing list (platform-driver-x86).

Thanks for sending it up! Anyone else who sees this the patch is here: https://lore.kernel.org/platform-driver-x86/a8e11149-36dc-fe7e-3a08-d6f33a107741@amd.com/T/#t

Will hold off for closing this issue until it's in Linus' tree.

mentioned in issue #1712 (closed)

What was the root cause? It seems that lowering the polling time is just masking the problem. Do we have two transactions intermixing such that the condition we are waiting for got cleared?

@rrangel Although the initial evidence supported this idea, I believe the analysis done by Fabrizio shows that there are not two intermixing transactions and that as long as the response register is read quickly enough everything works properly. FWIW the same type of interface that amdgpu uses has always used a 1 usec polling. We might never have seen this if amd-pmc started with the smaller amount of polling.

Glad we ruled out the intermixing transactions. Since this is an AC vs DC problem, I wonder if one of the interconnects has transitioned to a low power state and is no longer relaying the transactions?

Instead of using the MMIO PMC BAR to access the state, can we try using the indirect SMN method?

i.e.

pci_write_config_dword(rdev, AMD_PMC_SMU_INDEX_ADDRESS, /* PMC SMN ADDRESS */ + AMD_PMC_REGISTER_RESPONSE);
pci_read_config_dword(rdev, AMD_PMC_SMU_INDEX_DATA, &val);

I'm not sure what the PMC SMN address is though.

@rrangel I wish I know what is the root cause. This was definitely a lucky fix that does not really explain in full what is going on.

It is my understanding that the amd_pmc_send_cmd() does something like this:

wait for a response (from register AMD_PMC_REGISTER_RESPONSE, conddition: val != 0)
clear the response register
write argument and message
wait for a response

Why it needs to wait for a response before sending the command is beyond my understanding and you guys knows the AMD-PMC device better than me.

When the laptop is powered by AC everything works fine, when powered by batteries, the first wait times out (no matter how long is the timeout).

Anyway, I am willing to do further investigation if you guys wants to get to the root of this. At the end the problem is reproducible 100% of the time... but I need directions and perhaps documentation on the system. I have experience with embedded systems, not so much with PC architecture.

Why it needs to wait for a response before sending the command is beyond my understanding and you guys knows the AMD-PMC device better than me.

To make sure that previous command is done processing before sending another. IOW to avoid the type of problem Raul is mentioning.

Do we have two transactions intermixing such that the condition we are waiting for got cleared?

From the described behavior it sounds like a good hypothesis. However the mailbox used here is only supposed to be used by amd-pmc on Linux (and the similar matching driver on Windows). amd_pmc_send_cmdis protected by a mutex so there shouldn't be any multi thread entry problems.

s2idle fails when battery power on 5800U

Brief summary of the problem:

Hardware description:

System information:

How to reproduce the issue:

Attached files:

Log files (for system lockups / game freezes / crashes)

Designs

Child items ...

Activity

UPDATE!

New update

Admin message

Admin message

s2idle fails when battery power on 5800U

Brief summary of the problem:

Hardware description:

System information:

How to reproduce the issue:

Attached files:

Log files (for system lockups / game freezes / crashes)

Activity

UPDATE!

New update