[2020.08.12] i915 GPU hang report on 5.8.1-2-MANJARO kernel
It is my ongoing 2-month long rally of PC freezes and GPU hangs. Now it is more than 200 cases. There are no a day without GPU hangs or PC freeze.
PC freeze or GPU hang usually happens while semi-transparent, fade in/out, blur effects is/are in action. I have a feeling that fast occurred serie of GPU hangs leads PC to freeze. If only one-two GPU hang happened 'at once' than PC may freeze or may not freeze.
Posted >30 reports of a GPU hang issue. It is daily reports already. Switching to 4.19 kernel lowers the frequency of PC freezes, but PS is still almost unusable. Are there any chance to start to investigate the cause of problem? Can it be planned or posted rejection to investigate?
Since prev. report #2332 (closed) got these packages updates:
grep --text -iE 'installed|upgraded|removed' '/var/log/pacman.log' | tail -n 100
...
[2020-08-12T01:02:55+0000] [ALPM] upgraded nodejs (14.7.0-1 -> 14.8.0-1)
[2020-08-12T01:02:55+0000] [ALPM] upgraded opera (70.0.3728.95-1 -> 70.0.3728.106-1)
[2020-08-12T15:46:09+0000] [ALPM] upgraded linux58 (5.8.1-1 -> 5.8.1-2)
Further ticket: #2334 (closed)
How the issue in this ticket happen
Surfing in the Opera web browser on authoritative web site. last action was clicking on a link and picture freezes. Later about 1.5 minutes after I was able to execute (by a hot key) the script to collect error data. Taskbar clock freezes on the 18:20:40
time moment (HH:MM:SS format).
journalctl -b -o short-precise --no-hostname --dmesg
excerpt:
Aug 12 18:20:50.797140 kernel: ------------[ cut here ]------------
Aug 12 18:20:50.797234 kernel: WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:4488 default_wake_function+0x16/0x30
Aug 12 18:20:50.797302 kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse hid_logitech_hidpp joydev mousedev input_leds hid_logitech_dj hid_generic usbhid intel_xhci_usb_role_switch roles snd_usb_audio rfkill snd_usbmidi_lib snd_hwdep squashfs snd_rawmidi snd_seq_device x86_pkg_temp_thermal intel_powerclamp coretemp mc kvm_intel snd_pcm i915 iTCO_wdt intel_pmc_bxt ee1004 iTCO_vendor_support loop kvm snd_timer snd i2c_algo_bit soundcore irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel nls_iso8859_1 drm_kms_helper intel_rapl_msr intel_wmi_thunderbolt nls_cp437 crypto_simd cryptd cec glue_helper vfat fat rapl r8169 rc_core intel_cstate i2c_i801 intel_gtt syscopyarea intel_uncore processor_thermal_device realtek sysfillrect pcspkr i2c_smbus sysimgblt intel_rapl_common libphy fb_sys_fops intel_pch_thermal intel_soc_dts_iosf wmi int3403_thermal int340x_thermal_zone i2c_hid bmc150_accel_i2c bmc150_accel_core hid industrialio_triggered_buffer kfifo_buf int3400_thermal
Aug 12 18:20:50.800426 kernel: industrialio evdev mac_hid acpi_thermal_rel drm sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas crc32c_intel xhci_hcd
Aug 12 18:20:50.800540 kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.1-2-MANJARO #1
Aug 12 18:20:50.800608 kernel: Hardware name: Default string Default string/Default string, BIOS 5.12 11/10/2018
Aug 12 18:20:50.800694 kernel: RIP: 0010:default_wake_function+0x16/0x30
Aug 12 18:20:50.800784 kernel: Code: e8 3f 87 3e 00 eb 99 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f7 c2 fe ff ff ff 75 09 48 8b 7f 08 e9 0a f9 ff ff <0f> 0b 48 8b 7f 08 e9 ff f8 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
Aug 12 18:20:50.800850 kernel: RSP: 0018:ffffb728c0003e58 EFLAGS: 00010082
Aug 12 18:20:50.800926 kernel: RAX: ffffffffb2ae4c60 RBX: ffffb728c0497d30 RCX: ffffb728c0003e70
Aug 12 18:20:50.800989 kernel: RDX: 00000000ffffff92 RSI: 0000000000000003 RDI: ffffb728c0497d30
Aug 12 18:20:50.801057 kernel: RBP: ffff9633af7a6d68 R08: 0000000000009465 R09: 0000000000000001
Aug 12 18:20:50.801116 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000046
Aug 12 18:20:50.801174 kernel: R13: ffff9633af7a6d60 R14: ffffb728c0003e70 R15: ffff963390332928
Aug 12 18:20:50.801232 kernel: FS: 0000000000000000(0000) GS:ffff9633c1a00000(0000) knlGS:0000000000000000
Aug 12 18:20:50.801303 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 12 18:20:50.801377 kernel: CR2: 00001833fc0e0e18 CR3: 000000044d20a002 CR4: 00000000003606f0
Aug 12 18:20:50.801437 kernel: Call Trace:
Aug 12 18:20:50.801496 kernel: <IRQ>
Aug 12 18:20:50.801575 kernel: autoremove_wake_function+0xe/0x30
Aug 12 18:20:50.801636 kernel: __i915_sw_fence_complete+0x156/0x1b0 [i915]
Aug 12 18:20:50.801693 kernel: ? i915_sw_fence_complete+0x20/0x20 [i915]
Aug 12 18:20:50.801751 kernel: ? i915_sw_fence_complete+0x20/0x20 [i915]
Aug 12 18:20:50.801808 kernel: call_timer_fn+0x2d/0x160
Aug 12 18:20:50.801888 kernel: ? i915_sw_fence_complete+0x20/0x20 [i915]
Aug 12 18:20:50.801950 kernel: __run_timers+0x130/0x290
Aug 12 18:20:50.802007 kernel: run_timer_softirq+0x2b/0x50
Aug 12 18:20:50.802079 kernel: __do_softirq+0x10f/0x352
Aug 12 18:20:50.802138 kernel: asm_call_on_stack+0x12/0x20
Aug 12 18:20:50.802196 kernel: </IRQ>
Aug 12 18:20:50.802283 kernel: do_softirq_own_stack+0x5f/0x80
Aug 12 18:20:50.802344 kernel: irq_exit_rcu+0xcb/0x120
Aug 12 18:20:50.802403 kernel: sysvec_apic_timer_interrupt+0x46/0xe0
Aug 12 18:20:50.802461 kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Aug 12 18:20:50.802548 kernel: RIP: 0010:cpuidle_enter_state+0xb6/0x420
Aug 12 18:20:50.802621 kernel: Code: 50 a0 e1 4c e8 4b 67 8d ff 49 89 c7 0f 1f 44 00 00 31 ff e8 7c 75 8d ff 80 7c 24 0f 00 0f 85 06 02 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e9 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
Aug 12 18:20:50.802720 kernel: RSP: 0018:ffffffffb4003e40 EFLAGS: 00000246
Aug 12 18:20:50.802803 kernel: RAX: ffff9633c1a00000 RBX: ffff9633c1a36800 RCX: 000000000000001f
Aug 12 18:20:50.802867 kernel: RDX: 0000000000000000 RSI: ffffffffb3d73bca RDI: ffffffffb3d5396f
Aug 12 18:20:50.802939 kernel: RBP: ffffffffb40ca1a0 R08: 000000393506759b R09: 0000000000000018
Aug 12 18:20:50.803001 kernel: R10: 000000000000188a R11: 000000000000337a R12: 0000000000000008
Aug 12 18:20:50.803059 kernel: R13: ffff9633c1a36800 R14: 0000000000000008 R15: 000000393506759b
Aug 12 18:20:50.803116 kernel: cpuidle_enter+0x29/0x40
Aug 12 18:20:50.803174 kernel: do_idle+0x1fb/0x2c0
Aug 12 18:20:50.803232 kernel: cpu_startup_entry+0x19/0x20
Aug 12 18:20:50.803289 kernel: start_kernel+0x843/0x868
Aug 12 18:20:50.803348 kernel: secondary_startup_64+0xb6/0xc0
Aug 12 18:20:50.803406 kernel: ---[ end trace 613366cc4b7bd602 ]---
Aug 12 18:20:50.816109 kernel: [drm:drm_atomic_state_default_clear [drm]] Clearing atomic state 000000003cac32a9
Aug 12 18:20:50.816288 kernel: [drm:__drm_atomic_state_free [drm]] Freeing atomic state 000000003cac32a9
Aug 12 18:20:53.597296 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Aug 12 18:20:53.606043 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in kwin_x11 [814]
...
Aug 12 18:21:31.002770 kernel: i915 0000:00:02.0: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:51:pipe A] with [PLANE:47:cursor A] visible 1 -> 1, off 0, on 0, ms 0
Aug 12 18:22:17.437229 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
Aug 12 18:22:17.443099 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:efdfbffe, in kwin_x11 [814]
Aug 12 18:22:17.449364 kernel: [drm:drm_atomic_state_init [drm]] Allocated atomic state 000000003b9e0cd0
How often GPU of PC freezes happens
Frequency of (PC freezes by unknown reason (serie of sequential GPU hangs suspected) or GPU hangs logged in systemd journal) are near highest possible. It could happen on logon screen without any user activity or during GUI session actions: on a first or 5th or 40th minute. Average is about 2-3 minutes. It is not a concrete exact action, it is general unexpected case and it did happen in (m)any types of typical user activity such as:
-) on logon screen (without any user action, even mouse touch; saw that for about 7-8 times);
-) moving desktop icons;
-) open start menu;
-) open context menu;
-) moving cursor in the text editor via keyboard navigation keys;
-) surfing in system settings window;
-) typing text in terminal emulator (GUI);
-) installing updates in GUI app or GUI terminal emulator;
-) text selection line-by-line in text editor or canceling selection in the Opera browser;
-) open or surfing in Opera web browser: list of gitlab commits viewing, filling a description of an issue ticket on this gitlab.freedesktop.org, watching youtube videos (not fullscreen and not even touch keyboard and mice at least for about last 1-2 minutes), extremely fast freeze/crash while surfing maps.google.com, maps.ya.ru;
-) LiveCD GUI sessions;
etc.
Platform (CPU): Intel Core i5-8250U
System architecture: uname -m
: x86_64
Kernel version: uname -r
: 5.8.1-2-MANJARO
Linux distribution: Manjaro Linux (desktop environment: KDE)
Machine or motherboard model: Hystou Fanless Mini PC P03B-i5-8250U
Display connector: factory-made cable with connectors: HDMI
(connected to PC) - DVI-D
(connected to monitor)
Error data gathered in current hanged GUI user session (w/o switch into tty2 text mode) with the script collect_GPU_crash_data.zip, which collects:
# Collect main data
sudo cp /sys/class/drm/card0/error ...
sudo dmesg
journalctl -b -o short-precise --no-hostname --dmesg
cat /proc/cmdline
# Collect supplementary data
xrandr --verbose
sudo dmidecode -t bios -t system -t baseboard -t chassis -t processor
mhwd -l -d
cp /etc/X11/xorg.conf.d/20-intel.conf ...
sudo lspci -vvv -G
sudo lspci -vvv -G -H1
sudo lspci -vvv -G -H2
lscpu
lsmod
modinfo i915
modinfo drm
modinfo drm_kms_helper
modinfo intel_gtt
modinfo i2c_algo_bit
sudo systool -v -m i915
sudo systool -v -m drm
sudo systool -v -m drm_kms_helper
sudo systool -v -m intel_gtt
sudo systool -v -m i2c_algo_bit
uname -m
uname -r
tty
inxi -CIGMxxx --no-host
/sys/class/drm/card0/error
file alone:
0_content_of__sys_class_drm_card0_error.zip
Whole gathered data (including the error
file above) are in the archive:
2020.08.12_-18.22.28_collected_data_of_GPU_hang-_GPU_hanged.zip
Whole gathered data on the next boot while GPU hang not happen yet:
2020.08.12_-18.24.44_collected_data_of_GPU_hang-_the_next_boot_with_GPU_not_hanged__yet.zip