[2020.08.08] i915 GPU hang report on 5.8.0-2-MANJARO kernel
This is another one case of GPU hang on the same PC (HW + Linux distro). It is my ongoing 1.5-month long rally of PC freezes and GPU hangs.
Since prev. report #2315 (closed) got these packages updates:
grep --text -iE 'installed|upgraded|removed' '/var/log/pacman.log' | tail -n 50
...
[2020-08-08T14:10:38+0000] [ALPM] upgraded systemd-libs (246-1 -> 246.1-1)
[2020-08-08T14:10:38+0000] [ALPM] upgraded systemd (246-1 -> 246.1-1)
[2020-08-08T14:10:39+0000] [ALPM] upgraded lib32-systemd (246-1 -> 246.1-1)
[2020-08-08T14:10:39+0000] [ALPM] upgraded libbytesize (2.3-1 -> 2.4-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded linux-firmware (20200803.r1680.9bc3789-1 -> 20200807.r1689.c331aa9-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded pamac-common (9.5.6-3 -> 9.5.7-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded pamac-cli (9.5.6-3 -> 9.5.7-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded pamac-gtk (9.5.6-3 -> 9.5.7-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded pamac-snap-plugin (9.5.6-3 -> 9.5.7-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded pamac-tray-appindicator (9.5.6-3 -> 9.5.7-1)
[2020-08-08T14:10:40+0000] [ALPM] upgraded systemd-sysvcompat (246-1 -> 246.1-1)
[2020-08-08T15:09:39+0000] [ALPM] removed linux419 (4.19.138-1)
[2020-08-08T15:10:51+0000] [ALPM] installed linux419 (4.19.138-1)
[2020-08-08T23:47:46+0000] [ALPM] upgraded linux58 (5.8.0-1 -> 5.8.0-2)
Further issue: #2332 (closed)
My PC experienced about >100 times of (PC freezes + GPU hangs) during last 6 weeks on every kernel 'family' (4.19, 5.4, 5.7, 5.8-rc) avail. in the distro. 4.19 looks like more stable and usually (but far away from always) able to reset GPU and to continue to work without the PC reboot. The more modern kernel version the much faster GPU hangs without any software reset (which 4.19 kernel can do) or PC freezes.
PC freeze or GPU hang usually happens while semi-transparent, fade in/out, blur effects is/are in action.
I have a feeling that fast occurred serie of GPU hangs leads PC to freeze. If only one-two GPU hang happened 'at once' than PC may freeze or may not freeze.
Steps to reproduce the issue in this ticket
Entered GUI user session. Opened Opera web browser. Entered cs-online.club, joined game, bought smoke gredane, thrown it near me and got in-tab-picture freezed. Based on the sound a new game round starts, but in-tab picture was the same as was frozen. Made photos and than pressed the hot key to collect error data.
journalctl
excerpt:
Aug 08 23:54:26.029573 org_kde_powerdevil[919]: powerdevil: Enforcing inhibition from ":1.61" "/usr/lib/opera/opera" with cookie 2 and reason "Playing audio"
Aug 08 23:54:26.029597 org_kde_powerdevil[919]: powerdevil: By the time we wanted to enforce the inhibition it was already gone; discarding it
Aug 08 23:54:26.985678 xdg-desktop-portal-kde[1015]: xdp-kde-background: GetAppState called: no parameters
Aug 08 23:54:28.661211 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Aug 08 23:54:28.661522 kernel: i915 0000:00:02.0: [drm] opera[1186] context reset due to GPU hang
Aug 08 23:54:28.661679 kernel: i915 0000:00:02.0: [drm:__i915_request_reset.cold [i915]] context opera[1186]: guilty 1, banned
Aug 08 23:54:28.661830 kernel: i915 0000:00:02.0: [drm:__i915_request_reset.cold [i915]] client opera[1186]: gained 4 ban score, now 4
Aug 08 23:54:28.671359 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85df3cff, in opera [1186]
Aug 08 23:54:28.671641 kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Aug 08 23:54:28.671665 kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Aug 08 23:54:28.671682 kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Aug 08 23:54:28.671722 kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Aug 08 23:54:28.671742 kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Aug 08 23:54:28.671761 kernel: GPU crash dump saved to /sys/class/drm/card0/error
Aug 08 23:54:28.673352 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 19 created
Aug 08 23:54:28.910020 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 1 created
Aug 08 23:54:28.910332 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 2 created
Aug 08 23:54:28.916654 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 1 created
Aug 08 23:54:28.916961 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 2 created
Aug 08 23:54:28.929992 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 1 created
Aug 08 23:54:28.930317 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 2 created
Aug 08 23:54:28.933321 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 3 created
Aug 08 23:54:28.933642 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 4 created
Aug 08 23:54:28.939997 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 5 created
Aug 08 23:54:28.940304 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 6 created
Aug 08 23:54:28.949983 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 7 created
Aug 08 23:54:28.950308 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 8 created
Aug 08 23:54:29.031536 org_kde_powerdevil[919]: powerdevil: Enforcing inhibition from ":1.62" "/usr/lib/opera/opera" with cookie 3 and reason "Playing audio"
Aug 08 23:54:29.031554 org_kde_powerdevil[919]: powerdevil: Added interrupt session
Aug 08 23:54:29.032751 org_kde_powerdevil[919]: powerdevil: Can't contact ck
Aug 08 23:54:29.073317 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 9 created
Aug 08 23:54:29.073622 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 10 created
Aug 08 23:54:29.113325 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 11 created
Aug 08 23:54:29.113640 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 12 created
Aug 08 23:54:35.195713 org_kde_powerdevil[919]: powerdevil: Releasing inhibition with cookie 3
How often does the steps listed above trigger the issue
Frequency of (PC freezes by unknown reason (serie of sequential GPU hangs suspected) or GPU hangs logged in systemd journal) are near highest possible. It could happen on logon screen without any user activity or during GUI session actions: on a first or 5th or 40th minute. Average is about 1-2 minutes. It is not a concrete exact action, it is general unexpected case and it did happen in (m)any types of typical user activity such as:
-) on logon screen (without any user action, even mouse touch; saw that for about 5-6 times);
-) moving desktop icons;
-) open start menu;
-) open context menu;
-) moving cursor in the text editor via keyboard navigation keys;
-) surfing in system settings window;
-) typing text in terminal emulator (GUI);
-) installing updates in GUI app or GUI terminal emulator;
-) text selection line-by-line in text editor or canceling selection in the Opera browser;
-) open or surfing in Opera web browser: list of gitlab commits viewing, filling a description of an issue ticket on this gitlab.freedesktop.org, watching youtube videos (not fullscreen and not even touch keyboard and mice at least for about last 1-2 minutes), extremely fast freeze/crash while surfing maps.google.com, maps.ya.ru;
-) LiveCD GUI sessions;
etc.
Platform (CPU): Intel Core i5-8250U
System architecture: uname -m
: x86_64
Kernel version: uname -r
: 5.8.0-2-MANJARO
Linux distribution: Manjaro Linux (desktop environment: KDE)
Machine or motherboard model: Hystou Fanless Mini PC P03B-i5-8250U
Display connector: factory-made cable with connectors: HDMI
(connected to PC) - DVI-D
(connected to monitor)
Also:
KDE System Settings
has Composer
section which defaults are:
In this case I used the settings:
The GPU settings file /etc/X11/xorg.conf.d/20-intel.conf
is empty.
Error data gathered in current hanged GUI user session (w/o switch into tty2 text mode) with the script collect_GPU_crash_data.zip, which collects:
# Collect main data
sudo cp /sys/class/drm/card0/error ...
sudo dmesg
journalctl -b -o short-precise --no-hostname --dmesg
journalctl -b -o short-precise --no-hostname
cat /proc/cmdline
# Collect supplementary data
xrandr --verbose
sudo dmidecode -t bios -t system -t baseboard -t chassis -t processor
mhwd -l -d
cp /etc/X11/xorg.conf.d/20-intel.conf ...
sudo lspci -vvv -G
sudo lspci -vvv -G -H1
sudo lspci -vvv -G -H2
lscpu
lsmod
modinfo i915
modinfo drm
modinfo drm_kms_helper
modinfo intel_gtt
modinfo i2c_algo_bit
sudo systool -v -m i915
sudo systool -v -m drm
sudo systool -v -m drm_kms_helper
sudo systool -v -m intel_gtt
sudo systool -v -m i2c_algo_bit
uname -m
uname -r
tty
inxi -CIGMxxx --no-host
/sys/class/drm/card0/error
file alone:
0_content_of__sys_class_drm_card0_error.zip
Whole gathered data (including the error
file above) are in the archive:
2020.08.08_-23.55.03_collected_data_of_GPU_crash-_GPU_hang.zip
The same script gathered the data but on the next boot while GPU was not hanged yet:
2020.08.08_-23.55.47_collected_data_of_GPU_crash-_the_next_boot_while_GPU_not_hanged_yet.zip