Suspend/Resume latency tracking
Why?
Suspend/resume is very common scenario with laptops. If the device takes longer to resume than opening the lid it should be considered too slow. If it takes more than few seconds users would likely assume the device experienced hard hang and perform hard reboot.
How?
initcall_debug
kernel cmdline parameter allows us to get data on how long is it taking i915 to suspend and resume in dmesg, e.g.:
[ 120.875750] i915 0000:00:02.0: pci_pm_suspend+0x0/0x150 returned 0 after 1142753 usecs
[ 120.895224] i915 0000:00:02.0: pci_pm_suspend_late+0x0/0x40 returned 0 after 18355 usecs
[ 120.927083] i915 0000:00:02.0: pci_pm_suspend_noirq+0x0/0x2a0 returned 0 after 0 usecs
[ 122.727330] i915 0000:00:02.0: pci_pm_resume_noirq+0x0/0x110 returned 0 after 19456 usecs
[ 122.758654] i915 0000:00:02.0: i915_pm_resume_early+0x0/0x20 [i915] returned 0 after 1346 usecs
[ 123.059408] i915 0000:00:02.0: pci_pm_resume+0x0/0xa0 returned 0 after 237756 usecs
We can leverage that data in CI to make sure that we don't take too long to wakeup.
Proposed course of action:
- measure how much more logs this generates for our runs - hopefully it's kilos not megs and won't kill our machine
- write a test (
i915_pm_*
) that:- logs
/sys/power/mem_sleep
, etc. - goes to the end of
/dev/kmesg
igt_suspend_autoresume(SUSPEND_STATE_MEM, SUSPEND_TEST_NONE);
- gets and parses the traces from /dev/kmsg
- asserts that all i915
_suspend_
and_resume_
traces returned 0 - asserts that we managed to go to sleep and resume sensibly fast
- (exercise this scenario twice: with at least one screen on and all screens off)
- logs
- gather data and set sensible thresholds
- set our target (1s? 500ms?)
- (optional) make
igt_suspend_autoresume()
do that for every true suspend/resume- how to report issues here? or do we want this just to collect data?