[drm] GPU HANG: ecode 9:0:0x85dffffd, in chrome [18418], reason: hang on rcs0, action: reset
Submitted by muradm
Assigned to Intel 3D Bugs Mailing List
Link to original bug (#108717)
Description
Created attachment 142445 cat /sys/class/drm/card0/error
A month back I moved to ThinkPad X1 Carbon 6th Gen (20KH006MRT) with fresh ArchLinux install. Since then I'm battling with GPU.
Periodically (at least once a day, can do more frequently) GPU hangs. Google Chrome is running (with hardware acceleration). As the result, sometimes not in any particular order:
- GPU process of Chrome may crash on first hang, then in few hours Gnome is crashing any way
- Gnome may crash to black text mode screen with me be able to switch to another terminal to reboot
- Everything is crashing to black screen (no text cursor) and host not responding to anything (including network) then hard power cycle reboot is needed.
This happens regardless external monitor attached to HDMI or not.
I think I read every article / wiki available on subject, and tried a lot of configurations of i915 and other things.
Yesterday I switched from mainline 4.18 to testing 4.19 Linux kernel in order to get latest everything. Just now same hang happened as per 1) above.
journalctl (omitting other errors) =>
Nov 13 01:15:22 muradm-aln1 kernel: [drm] GPU HANG: ecode 9:0:0x85dffffd, in chrome [18418], reason: hang on rcs0, action: reset Nov 13 01:15:22 muradm-aln1 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Nov 13 01:15:22 muradm-aln1 kernel: [drm] Please file a new bug report on bugs.freedesktop.org against DRI -> DRM/Intel Nov 13 01:15:22 muradm-aln1 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Nov 13 01:15:22 muradm-aln1 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Nov 13 01:15:22 muradm-aln1 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error Nov 13 01:15:22 muradm-aln1 kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dump attached as well.
OS: Arch Linux x86_64 Kernel: 4.19.1-arch1-1-ARCH Host: 20KH006MRT ThinkPad X1 Carbon 6th DE: GNOME 3.30.1 CPU: Intel i7-8550U (8) @ 4.000GHz GPU: Intel UHD Graphics 620
Some related packages:
local/libdrm 2.4.96-1 local/libva 2.3.0-1 local/libva-intel-driver 2.2.0-1 local/libva-utils 2.3.0-1 local/linux 4.19.1.arch1-1 (base) local/linux-api-headers 4.17.11-1 local/linux-firmware 20181026.1cb4e51-1 (base) local/mesa 18.2.4-1 local/mesa-demos 8.4.0-1 local/qt5-wayland 5.11.2-1 (qt qt5) local/util-linux 2.33-2 (base base-devel) local/vulkan-icd-loader 1.1.85+2969+5abee6173-1 local/vulkan-intel 18.2.4-1 local/wayland 1.16.0-1 local/wayland-protocols 1.16-1 local/xorg-bdftopcf 1.1-1 (xorg xorg-apps) local/xorg-server 1.20.3-1 (xorg) local/xorg-server-common 1.20.3-1 (xorg) local/xorg-server-xwayland 1.20.3-1 (xorg) local/xorgproto 2018.4-1
cat /etc/modprobe.d/i915.conf options i915 modeset=1 enable_guc=3 enable_fbc=1 fastboot=1
dmesg | grep drm == (up to a point of hang) ============== [ 2.654949] fb: switching to inteldrmfb from EFI VGA [ 2.654994] [drm] Replacing VGA console driver [ 2.657309] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 2.657310] [drm] Driver supports precise vblank timestamp query. [ 2.659687] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 2.666245] [drm] HuC: Loaded firmware i915/kbl_huc_ver02_00_1810.bin (version 2.0) [ 2.677443] [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39) [ 3.224056] [drm] Initialized i915 1.6.0 20180719 for 0000:00:02.0 on minor 0 [ 3.674308] fbcon: inteldrmfb (fb0) is primary device [ 3.674318] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device [ 4.145904] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [ 31.447100] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [ 3377.147569] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [ 3389.843556] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [ 3391.847593] [drm] HuC: Loaded firmware i915/kbl_huc_ver02_00_1810.bin (version 2.0) [ 3391.858472] [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39) [ 3392.079989] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS. [ 3413.745747] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Attachment 142445, "cat /sys/class/drm/card0/error":
drmi915klcrash.dmp.gz