Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
======================================
Software
======================================
kernel version : 4.16.0-rc5-drm-intel-qa-ww11-commit-307515c+
hostname : CFL-2
architecture : x86_64
os version : Ubuntu 17.10
os codename : artful
kernel driver : i915
bios revision : 118.9
bios release date : 01/12/2018
ksc : 1.5
hardware acceleration : disabled
swap partition : enabled on (/dev/sda2)
======================================
Graphic drivers
======================================
grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory
libdrm : 2.4.91
intel-gpu-tools (tag) : intel-gpu-tools-1.21-211-g1bb3995e
intel-gpu-tools (commit) : 1bb3995e
======================================
Hardware
======================================
motherboard model : CoffeeLakeClientPlatform
motherboard id : CoffeeLakeSUDIMMRVP
form factor : Desktop
manufacturer : IntelCorporation
cpu family : Other
cpu family id : 6
cpu information : Genuine Intel(R) CPU 0000 @ 3.60GHz
gpu card : Intel Corporation Device 3e92 (prog-if 00 [VGA controller])
memory ram : 15.57 GB
max memory ram : 32 GB
cpu thread : 12
cpu core : 6
cpu model : 158
cpu stepping : 10
socket : Other
hard drive : 447GiB (480GB)
current cd clock frequency : 337500 kHz
maximum cd clock frequency : 675000 kHz
displays connected : eDP-1 DP-1 DP-2
======================================
Firmware
======================================
dmc fw loaded : yes
dmc version : 1.4
guc fw loaded : fetch SUCCESS, load SUCCESS
guc version wanted : wanted 9.39, found 9.39
guc version found : wanted 9.39, found 9.39
On fi-skl-lmem and fi-hsw-4770r the EDID is consistently read corrupted in the same way from the monitor. So I think the EDID memory is corrupted, we could change the monitor or try to fix and reflash the EDID.
Not a customer impacting bug on ICL - DRM is successfully getting the proper EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM has the necessary EDID info, so no user impact on the functionality of i915.
Current implementation is rather naive, and goes through /dev/i2c-* trying to read out EDID:
while ((dirent = readdir(dir))) {
if (strncmp(dirent->d_name, "i2c-", 4) == 0) {
sprintf(full_name, "/dev/%s", dirent->d_name);
fd = open(full_name, O_RDWR);
igt_assert_neq(fd, -1);
if (i2c_edid_is_valid(fd))
ret++;
close(fd);
}
}
The drmModeRes part is as sophisticated:
while ((dirent = readdir(dir))) {
if (strncmp(dirent->d_name, "i2c-", 4) == 0) {
sprintf(full_name, "/dev/%s", dirent->d_name);
fd = open(full_name, O_RDWR);
igt_assert_neq(fd, -1);
if (i2c_edid_is_valid(fd))
ret++;
close(fd);
}
}
I think we should change implementation a bit and add extra logging around this, i.e. use /sys/class/drm/card?-/i2c- to do readouts, and compare EDIDs on a connector basis, printing out both in case one is corrupted. This may tell us something about particular screen or connectors that are being troublesome.
Not a customer impacting bug on ICL - DRM is successfully getting the proper
EDIDs, it's just the direct i2c EDID gathering method that is failing. DRM
has the necessary EDID info, so no user impact on the functionality of i915.
Indeed on ICL we only see:
DEBUG: i2c edids:1 drm edids:2 vga outputs:0
Also the bug has not been seen on ICL in 3 weeks (since CI_DRM_6085). Prior to that reproduction rate was ~ 1 every 5 CI_DRMs. We are now at CI_DRM_6225, which is more than 5x10=50 runs since last occurence, taking ICL tag out.
On everything else we have some occurrences happening the other way aroudn:
(i915_pm_rpm:1086) DEBUG: i2c edids:1 drm edids:0 vga outputs:0
The patches made by Oleg will allow us to get more details on this by tying the i2c devices to connectors on the kernel side, and then the test logging more information about the failure - which connector, what part has failed (readout? do we have a mismatch?) and dump the raw values.
Keeping the bug high, as not having you monitor EDID read out correctly by DRM is a serious problem for users.
Having the history a bit de-cluttered by ICLs being seemingly fixed we can take a look
at the new landscape of failures. There are two machines that are the most consistent
with failing (every [[idle run]]).
This one seems to have VGA dongle connected but it does not have native VGA.
Because of that the test fails to realize its VGA and does not ignore it.
The dongle is the most naive one possible, just a few pins connected through
resistors, so the EDID that DRM sees is faked by the DpVga HW to reflect,
the default VGA modes. There is nothing on the other side when using i2c
directly. The test has to be fixed so that it is aware of DpVga.
This one has a HDMI dummy connected to non-native (LSPcon) HDMI on the board.
The suspicious thing here are the modes - they are the default VGA ones.
The HDMI dummy may be faulty one and fails to talk i2c and the LSPcon HW is
faking an EDID. I would advice replacing the dongle.