Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Unplug power supply, then put the machine to sleep (or echo 'freeze' > /sys/power/state).
Machine will go attempt to go to sleep but fail and remains non-operable. Display is off and hard power cycle (log press power button) is required to restart.
Attached files:
Log files (for system lockups / game freezes / crashes)
Captured syslog (/var/log/syslog) during failure and during a success sleep. Isolated relevant log messages.
Both logs where produced with initcall_debug_enabled kernel argument:
Then eventually the system attempts to resume and a bunch of other failures occurs.
I guess error -110 is ETIMEDOUT caused by the SMU not responding in time.
I discovered some quite useful piece of information.
Ok I start to think what are the differences between plugged in (AC power) or not (battery power), there are several actions that are taken when connected to AC.
I have used powertop to see what change. In particular, when running with battery power, the following services are active:
Good Runtime PM for I2C Adapter i2c-4 (AMDGPU DM i2c hw bus 0) Good Runtime PM for I2C Adapter i2c-5 (AMDGPU DM i2c hw bus 1) Good Runtime PM for I2C Adapter i2c-3 (SMBus PIIX4 adapter port 1 at ff20) Good Runtime PM for I2C Adapter i2c-1 (SMBus PIIX4 adapter port 0 at ff00) Good Runtime PM for I2C Adapter i2c-6 (AMDGPU DM i2c hw bus 2) Good Runtime PM for I2C Adapter i2c-2 (SMBus PIIX4 adapter port 2 at ff00)
(Good means power savings is enabled)
When laptop runs with AC:
Bad Runtime PM for I2C Adapter i2c-4 (AMDGPU DM i2c hw bus 0) Bad Runtime PM for I2C Adapter i2c-5 (AMDGPU DM i2c hw bus 1) Bad Runtime PM for I2C Adapter i2c-3 (SMBus PIIX4 adapter port 1 at ff20) Bad Runtime PM for I2C Adapter i2c-1 (SMBus PIIX4 adapter port 0 at ff00) Bad Runtime PM for I2C Adapter i2c-6 (AMDGPU DM i2c hw bus 2) Bad Runtime PM for I2C Adapter i2c-2 (SMBus PIIX4 adapter port 2 at ff00)
(bad means power savings is disabled)
So:
/sys/bus/i2c/devices/i2c-3/device/power/control controls the SMBus PIIX4 adapter ports (all 3)
/sys/bus/i2c/devices/i2c-4/device/power/control controls the AMDGPU DM i2c bus (all 3)
I then tried to manually deactivate the 'auto' power savings for those devices and put the computer to sleep with this script:
Tested few times to enter the suspend with the script previously mentioned, and it eventually fail with those settings enabled too.
From the logs, it seems the problem is still a timeout with the SMU:
Someone from display side might need to comment on the context for how this warning happens and if it's something we need to worry about. @siqueira can you look at that?
See if that helps out for your system, or any other behavior changes.
@lijo I was thinking about this type of failure, do you think changing amdgpu_get_power_dpm_force_performance_level to another value might help as a workaround? Or any other sysfs that could change behavior?
Nov 22 22:50:26 princeton kernel: [ 195.291168] amd_pmc AMDI0005:00: SMU response timed outNov 22 22:50:26 princeton kernel: [ 205.291278] amd_pmc AMDI0005:00: failed to talk to SMU
Then I started to instrument the code (add some dev_info calls to see if I can spot something), and eventually I have found something: it appears the poll time to the readx_poll_timeout is too high (PMC_MSG_DELAY_MIN_US).
The value is normally set to 100 uSec, but when I change it to 50 uSec, everything works fine: the machine goes to suspend immediately even when running with battery. I ran it 5 times (suspend/resume) and I did not get a single failure/timeout.
This sleep time is used in the read_poll_timeout macro (include/linux/iopoll.h) that is calling usleep_range(tMin, tMax).
May be by reducing PMC_MSG_DELAY_MIN_US to 50 uSec, the read_poll_timeout spins faster and it doesn't miss the given condition (that terminates the spin loop)? Can it be the reached condition (that stops the spin loop) is cleared automatically ? Can it be related to power management of the core? So when running with battery CPU is slower and somehow misses the condition change?
This is a very interesting result! Please continue to monitor and see if it really fixes things. If so would you mind sending a patch up to the mailing lists?
One possibility is - if the device already entered D3hot state, memory mapped register access requests won't work. Checked amdgpu, there we have a tight loop for polling with a delay of 1us between each read. It seems better to have min as 1usec and max as 2s explicitly in that code.
@lijo so would you recommend to lower the poll delay all the way down to 2 (usec) ? So the usleep_range will end up using min=1, max=2 uSec?
So far I've been putting my machine to sleep with battery power and I have not encountered a single failure.
I have noticed though that after my change entering the sleep state takes a bit longer (I'm talking perhaps 1 second at the most) compared to the original code (powered with AC).
Let me know what you guys think it's best and I'll submit a patch.
Hmm... well I really meant 2 uSec, not 2 sec... but I am talking as the average delay between read, the 2 seconds it can still be the upper timeout after which read_poll_timeout will report a timeout error.
The read_poll_timeout only takes one argument: the max timeout, and this value (according to the doc) should be less than 20ms. This value is used as sleep in:
usleep_range((__sleep_us >> 2) + 1, __sleep_us);
But, yes, at the end my question is: shall I submit the patch with whatever I have now:
One possibility is - if the device already entered D3hot state, memory mapped register access requests won't work
I think adding pm_debug_messages will add another data point if this is the right root cause. Should see the order of events whether it was already in D3hot at this time. Still if this is the root cause - I think the proper solution is to delay the D3hot call to "after" rather than mucking with timeouts more because it's still inherently racy.
Yeah it's the standard process indicated there. I think if we confirm D3hot timing was not the root cause then go ahead and send up a patch. Otherwise if it is, we might need to think about changing it so the PCI device goes into D3hot after the suspend function is called instead of before.
This log was captured using kernel 5.14.0-1007-oem (Ubuntu) with kernel args: pm_debug_messages and initcall_debug:
fail_with_extra_debug.log
This is the same log as above but with only pm_debug_messages (less noise probably), using same kernel:
fail_with_pm_debug_messages.log
This log was captured using my modified version (kernel 5.14.0+, from vanilla main stream Linux repo), with pm_debug_messages. You will also see some FABDEBUG> messages where I have instrumented the driver:
success-mod.log
From the logs it seems like D3hot is not the problem here. The suspend_noirq of gfx device (generic pci suspend_noirq) is run after pmc's suspend noirq.
What was the root cause? It seems that lowering the polling time is just masking the problem. Do we have two transactions intermixing such that the condition we are waiting for got cleared?
@rrangel Although the initial evidence supported this idea, I believe the analysis done by Fabrizio shows that there are not two intermixing transactions and that as long as the response register is read quickly enough everything works properly.
FWIW the same type of interface that amdgpu uses has always used a 1 usec polling. We might never have seen this if amd-pmc started with the smaller amount of polling.
Glad we ruled out the intermixing transactions. Since this is an AC vs DC problem, I wonder if one of the interconnects has transitioned to a low power state and is no longer relaying the transactions?
Instead of using the MMIO PMC BAR to access the state, can we try using the indirect SMN method?
@rrangel I wish I know what is the root cause. This was definitely a lucky fix that does not really explain in full what is going on.
It is my understanding that the amd_pmc_send_cmd() does something like this:
wait for a response (from register AMD_PMC_REGISTER_RESPONSE, conddition: val != 0)
clear the response register
write argument and message
wait for a response
Why it needs to wait for a response before sending the command is beyond my understanding and you guys knows the AMD-PMC device better than me.
When the laptop is powered by AC everything works fine, when powered by batteries, the first wait times out (no matter how long is the timeout).
Anyway, I am willing to do further investigation if you guys wants to get to the root of this. At the end the problem is reproducible 100% of the time... but I need directions and perhaps documentation on the system. I have experience with embedded systems, not so much with PC architecture.
Why it needs to wait for a response before sending the command is beyond my understanding and you guys knows the AMD-PMC device better than me.
To make sure that previous command is done processing before sending another. IOW to avoid the type of problem Raul is mentioning.
Do we have two transactions intermixing such that the condition we are waiting for got cleared?
From the described behavior it sounds like a good hypothesis.
However the mailbox used here is only supposed to be used by amd-pmc on Linux (and the similar matching driver on Windows). amd_pmc_send_cmdis protected by a mutex so there shouldn't be any multi thread entry problems.