[2020.08.13-2] i915 GPU hang report on 5.8.1-2-MANJARO kernel
It is my ongoing 2-month long rally of PC freezes and GPU hangs. Now it is more than 200 cases. There are no a day without GPU hangs or PC freeze.
PC freeze or GPU hang usually happens while semi-transparent, fade in/out, blur effects is/are in action. I have a feeling that fast occurred serie of GPU hangs leads PC to freeze. If only one-two GPU hang happened 'at once' than PC may freeze or may not freeze.
Posted >30 reports of a GPU hang issue. It is daily reports already. The website's captcha engine already can't recognize me human am I or a bot and shows me it's tasks to complete. Switching to 4.19 kernel lowers the frequency of PC freezes, but PS is still almost unusable. Are there any chance to start to investigate the cause of problem? Can it be planned or posted rejection to investigate?
Since prev. report #2341 (closed) got these packages updates:
grep --text -iE 'installed|upgraded|removed' '/var/log/pacman.log' | tail -n 100
...
<no any updated since the prev. ticket>
Further ticket: #2343 (closed)
How the issue in this ticket happen
Minimize/maximize a window for 3-4 times. Picture freezes. Taskbar clock freezes on the 02:55:58
time moment (in HH:MM:SS format). I was able to execute (by a hot key) the script to collect error data.
journalctl -b -o short-precise --no-hostname --dmesg
excerpt:
Aug 13 02:55:58.380526 kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:47:cursor A] visible 1 -> 1, off 0, on 0, ms 0
Aug 13 02:55:58.403703 kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:113] for [PLANE:47:cursor A] state 000000009c31e416
Aug 13 02:55:58.403905 kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:47:cursor A] visible 1 -> 1, off 0, on 0, ms 0
Aug 13 02:55:59.070371 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 5 created
Aug 13 02:55:59.070800 kernel: i915 0000:00:02.0: [drm:i915_gem_context_create_ioctl [i915]] HW context 6 created
Aug 13 02:56:13.904601 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:0:00000000
Aug 13 02:56:13.905655 kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Aug 13 02:56:13.905760 kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Aug 13 02:56:13.905843 kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Aug 13 02:56:13.905921 kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Aug 13 02:56:13.906017 kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Aug 13 02:56:13.906096 kernel: GPU crash dump saved to /sys/class/drm/card0/error
Aug 13 02:56:13.906175 kernel: i915 0000:00:02.0: [drm] Resetting bcs0 for stopped heartbeat on bcs0
How often GPU of PC freezes happens
Frequency of (PC freezes by unknown reason (serie of sequential GPU hangs suspected) or GPU hangs logged in systemd journal) are near highest possible. It could happen on logon screen without any user activity or during GUI session actions: on a first or 5th or 40th minute. Average is about 2-3 minutes. It is not a concrete exact action, it is general unexpected case and it did happen in (m)any types of typical user activity such as:
-) on logon screen (without any user action, even mouse touch; saw that for about 7-8 times);
-) moving desktop icons;
-) open start menu;
-) open context menu;
-) moving cursor in the text editor via keyboard navigation keys;
-) surfing in system settings window;
-) typing text in terminal emulator (GUI);
-) installing updates in GUI app or GUI terminal emulator;
-) text selection line-by-line in text editor or canceling selection in the Opera browser;
-) open or surfing in Opera web browser: list of gitlab commits viewing, filling a description of an issue ticket on this gitlab.freedesktop.org, watching youtube videos (not fullscreen and not even touch keyboard and mice at least for about last 1-2 minutes), extremely fast freeze/crash while surfing maps.google.com, maps.ya.ru;
-) LiveCD GUI sessions;
etc.
Platform (CPU): Intel Core i5-8250U
System architecture: uname -m
: x86_64
Kernel version: uname -r
: 5.8.1-2-MANJARO
Linux distribution: Manjaro Linux (desktop environment: KDE)
Machine or motherboard model: Hystou Fanless Mini PC P03B-i5-8250U
Display connector: factory-made cable with connectors: HDMI
(connected to PC) - DVI-D
(connected to monitor)
Error data gathered in current hanged GUI user session (w/o switch into tty2 text mode) with the script collect_GPU_hang_data.zip, which collects:
# Collect main data
sudo cp /sys/class/drm/card0/error ...
sudo dmesg
journalctl -b -o short-precise --no-hostname --dmesg
cat /proc/cmdline
# Collect supplementary data
xrandr --verbose
sudo dmidecode -t bios -t system -t baseboard -t chassis -t processor
mhwd -l -d
cp /etc/X11/xorg.conf.d/20-intel.conf ...
sudo lspci -vvv -G
sudo lspci -vvv -G -H1
sudo lspci -vvv -G -H2
lscpu
lsmod
modinfo i915
modinfo drm
modinfo drm_kms_helper
modinfo intel_gtt
modinfo i2c_algo_bit
sudo systool -v -m i915
sudo systool -v -m drm
sudo systool -v -m drm_kms_helper
sudo systool -v -m intel_gtt
sudo systool -v -m i2c_algo_bit
uname -m
uname -r
tty
inxi -CIGMxxx --no-host
/sys/class/drm/card0/error
file alone:
0_content_of__sys_class_drm_card0_error.zip
Whole gathered data (including the error
file above) are in the archive: