[2020.08.13-3] i915 GPU hang report on 5.8.1-2-MANJARO kernel
It is my ongoing 2-month long rally of PC freezes and GPU hangs. Now it is more than 200 cases. There are no a day without GPU hangs or PC freeze.
PC freeze or GPU hang usually happens while semi-transparent, fade in/out, blur effects is/are in action. I have a feeling that fast occurred serie of GPU hangs leads PC to freeze. If only one-two GPU hang happened 'at once' than PC may freeze or may not freeze.
Posted >30 reports of a GPU hang issue. It is daily reports already. The website's captcha engine already can't recognize me human am I or a bot and shows me it's tasks to complete. Switching to 4.19 kernel lowers the frequency of PC freezes, but PS is still almost unusable. Are there any chance to start to investigate the cause of problem? Can it be planned or posted rejection to investigate?
Since prev. report #2342 (closed) got these packages updates:
grep --text -iE 'installed|upgraded|removed' '/var/log/pacman.log' | tail -n 100
...
<no any updated since the prev. ticket>
How the issue in this ticket happen
It is the same boot as was in prev. ticket mentioned above. In case of this GPU hang it was right click on a taskbar icon. Picture freezes. Taskbar clock freezes on the 02:56:44
time moment (in HH:MM:SS format). I was able to execute (by a hot key) the script to collect error data.
journalctl -b -o short-precise --no-hostname --dmesg
excerpt:
Aug 13 02:56:44.503941 kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:47:cursor A] visible 1 -> 1, off 0, on 0, ms 0
Aug 13 02:56:44.753657 kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:115] for [PLANE:47:cursor A] state 00000000c4c11822
Aug 13 02:56:44.753934 kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:47:cursor A] visible 1 -> 1, off 0, on 0, ms 0
Aug 13 02:56:59.984479 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:0:00000000
Aug 13 02:56:59.985554 kernel: i915 0000:00:02.0: [drm] Resetting bcs0 for stopped heartbeat on bcs0
Aug 13 02:57:44.870516 kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:115] for [PLANE:47:cursor A] state 000000000cf3c50a
Aug 13 02:57:44.870971 kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:47:cursor A] visible 1 -> 1, off 0, on 0, ms 0
Aug 13 02:57:44.877052 kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:115] for [PLANE:47:cursor A] state 000000003d3b4d11
How often GPU of PC freezes happens
Frequency of (PC freezes by unknown reason (serie of sequential GPU hangs suspected) or GPU hangs logged in systemd journal) are near highest possible. It could happen on logon screen without any user activity or during GUI session actions: on a first or 5th or 40th minute. Average is about 2-3 minutes. It is not a concrete exact action, it is general unexpected case and it did happen in (m)any types of typical user activity such as:
-) on logon screen (without any user action, even mouse touch; saw that for about 7-8 times);
-) moving desktop icons;
-) open start menu;
-) open context menu;
-) moving cursor in the text editor via keyboard navigation keys;
-) surfing in system settings window;
-) minimize/maximized window several times in a row;
-) Alt+Tab
between opened windows several times in a row;
-) typing text in terminal emulator (GUI);
-) installing updates in GUI app or GUI terminal emulator;
-) text selection line-by-line in text editor or canceling selection in the Opera browser;
-) open or surfing in Opera web browser: list of gitlab commits viewing, filling a description of an issue ticket on this gitlab.freedesktop.org
, watching youtube videos (not fullscreen and not even touch keyboard and mice at least for about last 1-2 minutes), extremely fast freeze/crash while surfing maps.google.com
, maps.ya.ru
, running through smoke of smoke grenade in cs-online.club
in-browser 3D game ;
-) LiveCD GUI sessions;
etc.
Platform (CPU): Intel Core i5-8250U
System architecture: uname -m
: x86_64
Kernel version: uname -r
: 5.8.1-2-MANJARO
Linux distribution: Manjaro Linux (desktop environment: KDE)
Machine or motherboard model: Hystou Fanless Mini PC P03B-i5-8250U
Display connector: factory-made cable with connectors: HDMI
(connected to PC) - DVI-D
(connected to monitor)
Error data gathered in current hanged GUI user session (w/o switch into tty2 text mode) with the script collect_GPU_hang_data.zip, which collects:
# Collect main data
sudo cp /sys/class/drm/card0/error ...
sudo dmesg
journalctl -b -o short-precise --no-hostname --dmesg
cat /proc/cmdline
# Collect supplementary data
xrandr --verbose
sudo dmidecode -t bios -t system -t baseboard -t chassis -t processor
mhwd -l -d
cp /etc/X11/xorg.conf.d/20-intel.conf ...
sudo lspci -vvv -G
sudo lspci -vvv -G -H1
sudo lspci -vvv -G -H2
lscpu
lsmod
modinfo i915
modinfo drm
modinfo drm_kms_helper
modinfo intel_gtt
modinfo i2c_algo_bit
sudo systool -v -m i915
sudo systool -v -m drm
sudo systool -v -m drm_kms_helper
sudo systool -v -m intel_gtt
sudo systool -v -m i2c_algo_bit
uname -m
uname -r
tty
inxi -CIGMxxx --no-host
/sys/class/drm/card0/error
file alone:
0_content_of__sys_class_drm_card0_error.zip
Whole gathered data (including the error
file above) are in the archive: