[CI][DRMTIP] igt@suspend - incomplete

added CI platform: ICL platform: KBL platform: SKL priority::medium severity::normal + 1 deleted label

CI Bug Log said:

A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@(suspend|s3) - incomplete -}
{+ GUC: igt@(suspend|s3) - incomplete +}

New failures caught by the filter:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_191/fi-kbl-guc/igt@i915_suspend@forcewake.html

Jon Ewins said:

Further investigation will be deferred until after upcoming update to guc version.

Daniele Ceraolo Spurio @dceraolo said:

This is not happening with the new FW, so closing.

Martin Peres @mupuf said:

(In reply to Daniele Ceraolo Spurio from comment 3)

This is not happening with the new FW, so closing.

It has happened 9 times in the last 14 drmtip runs, and has never been not seen for more than 4 runs at a time. This means you did not follow the 10x shown in the bug assessment process. Please follow all the steps carefuly and not skip directly to closing the issue.

Sujaritha Sundaresan @sujaritha said:

I'm currently following this bug. In the last look through the CI results I can see that this is still occurring but I haven't been able to identify the exact issue yet.

Sujaritha Sundaresan @sujaritha said:

I have been able to reproduce this bug on an ICL with the gem_ctx_isolation@vcs0-s3 and i915_suspend@forcewake tests. On these runs I completely lose the DUT after the failed test run. The next will be to get some serial logs for this.

Sujaritha Sundaresan @sujaritha said:

This issue is recurrently seen on the following five tests: kms_vblank@pipe-c/b-continuation-suspend, gem_workarounds@suspend-resume context, gem_ctx_isolation@vcs0-s3, i915_suspend@forcewake. For all these tests, locally I can see this issue happening without guc as well.

Jon Ewins said:

As issue reproduced on SKL and ICL with and without GuC, changing i915 feature selection from firmware/guc to power/suspend/resume.

Jon Ewins said:

Local test result confirmed, but the CI evidence of being seen only on our -guc machines is compelling. Issue on kms tests might be a new regression. Re-adding firmware/guc to i915/feature.

Don Hiatt said:

Suja and I have been working on trying to duplicate this.

On ICL, the i915_suspend test just appears to hang (see below)

gta@ubt-18:~/ril-src/igt-gpu-tools$ sudo ./build/tests/i915_suspend
IGT-Version: 1.24-g5a6c6856 (x86_64) (Linux: 5.3.0+ x86_64)
Starting subtest: fence-restore-tiled2untiled
[cmd] rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Sep 27 22:42:17 2019
checking the first canary object
checking the second canary object
Subtest fence-restore-tiled2untiled: SUCCESS (7.957s)
Starting subtest: fence-restore-untiled
[cmd] rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Sep 27 22:42:39 2019
checking the first canary object
checking the second canary object

Subtest fence-restore-untiled: SUCCESS (6.978s)
Starting subtest: debugfs-reader
[cmd] rtcwake: assuming RTC uses UTC ... <
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Sep 27 22:43:02 2019 <------- seems to hang here?

However, with a serial port connected it turns out that the dut does not die after all, as we still have an interactive console and can see kernel messages.
It seems that the netdev isn't waking up and that is why the test appears to hand and you can't ssh into it again.

Also, looking the running processes the test appears to be running.

Lastly, we're seeing a "PM: Cannot get swap device, try swapon -a" and
"PM: Cannot get swap writer" on the console. I wondering if the test is trying to hibernate and is expecting swap space?

I have the console going and it looks like the machine is not really dead.
The serial port is still interactive but the network appears dead, that is why you don’t see any output on your terminal, nor
can you ssh into the dut.

From the serial console, the test is still running.

The error on the serial console seems to imply it is expecting the machine to have a swap space enabled. Perhaps that is
the reason the test just appears to hang. We now know the device does come out of suspend, only that the network isn’t
restarted.

Don Hiatt said:

(In reply to Don Hiatt from comment 10)

I have the console going and it looks like the machine is not really dead.
The serial port is still interactive but the network appears dead, that is
why you don’t see any output on your terminal, nor
can you ssh into the dut.

From the serial console, the test is still running.

The error on the serial console seems to imply it is expecting the machine
to have a swap space enabled. Perhaps that is
the reason the test just appears to hang. We now know the device does come
out of suspend, only that the network isn’t
restarted.

Sorry, this was a cut and paste repeat of what I was saying.

Don Hiatt uploaded an attachment:

Attachment 145560, "dmesg from serial console":
dmesg.txt

Don Hiatt said:

After enabling swap on the dut, the tests are passing.

Sujaritha Sundaresan @sujaritha said:

This bug has not been seen for about a week now on any of the platforms it was previously seen on. I will continue to track this bug and update if there are any changes.

Sujaritha Sundaresan @sujaritha said:

This issue was recently seen again on the gem_eio@in-flight-suspend and kms_pipe_crc_basic@suspend-read-crc-pipe-b tests. Initially the incomplete tests were successful after enabling swap on guc devices. After assessing the new logs, it looks like neither of these issues are guc specific. The same issues are seen across non-guc systems as well. This particular bug log appears to be capturing general issues seen on guc systems. I do not think they are specific to guc.

Sujaritha Sundaresan @sujaritha said:

This issue is possibly being seen again primarily on TGL systems.

assigned to @sujaritha

This is not a current issue, certainly not in its original form and is masking potential regressions.

closed

[CI][DRMTIP] igt@suspend - incomplete

Submitted by Martin Peres `@mupuf`

Description

Child items ...

Activity

Admin message

Admin message

[CI][DRMTIP] igt@*suspend* - incomplete

Submitted by Martin Peres @mupuf

Description

Activity

[CI][DRMTIP] igt@suspend - incomplete

Submitted by Martin Peres `@mupuf`