Intel Iris Plus G7 (rev 07) hang & reset under load
Bug description:
Under GPU load, X hangs, then resets; after this reset, the system is responsive but hardly workable (massive input lag). A reboot is needed.
System environment:
-- chipset:
$ sudo lspci -v -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Iris Plus Graphics G7 (rev 07) (prog-if 00 [VGA controller])
Subsystem: Lenovo Iris Plus Graphics G7
Flags: bus master, fast devsel, latency 0, IRQ 126
Memory at 601c000000 (64-bit, non-prefetchable) [size=16M]
[virtual] Memory at 4000000000 (64-bit, prefetchable) [size=256M]
I/O ports at 3000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [100] Process Address Space ID (PASID)
Capabilities: [200] Address Translation Service (ATS)
Capabilities: [300] Page Request Interface (PRI)
Kernel driver in use: i915
-- system architecture:
$ grep "model name" /proc/cpuinfo | head -1
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
-- xf86-video-intel:
$ dpkg -l xserver-xorg-video-intel | grep ^ii
ii xserver-xorg-video-intel 2:2.99.917+git20190815-1 amd64 X.Org X server -- Intel i8xx, i9xx display driver
-- xserver:
$ dpkg -l xserver-xorg | grep ^ii
ii xserver-xorg 1:7.7+20 amd64 X.Org X server
-- mesa:
$ dpkg -l mesa* | grep ^ii
ii mesa-utils 8.4.0-1+b1 amd64 Miscellaneous Mesa GL utilities
ii mesa-va-drivers:amd64 19.3.1-4 amd64 Mesa VA-API video acceleration drivers
ii mesa-va-drivers:i386 19.3.1-4 i386 Mesa VA-API video acceleration drivers
-- libdrm:
$ dpkg -l libdrm* | grep ^ii
ii libdrm-amdgpu1:amd64 2.4.100-4 amd64 Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii libdrm-amdgpu1:i386 2.4.100-4 i386 Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii libdrm-common 2.4.100-4 all Userspace interface to kernel DRM services -- common files
ii libdrm-intel1:amd64 2.4.100-4 amd64 Userspace interface to intel-specific kernel DRM services -- runtime
ii libdrm-intel1:i386 2.4.100-4 i386 Userspace interface to intel-specific kernel DRM services -- runtime
ii libdrm-nouveau2:amd64 2.4.100-4 amd64 Userspace interface to nouveau-specific kernel DRM services -- runtime
ii libdrm-nouveau2:i386 2.4.100-4 i386 Userspace interface to nouveau-specific kernel DRM services -- runtime
ii libdrm-radeon1:amd64 2.4.100-4 amd64 Userspace interface to radeon-specific kernel DRM services -- runtime
ii libdrm-radeon1:i386 2.4.100-4 i386 Userspace interface to radeon-specific kernel DRM services -- runtime
ii libdrm2:amd64 2.4.100-4 amd64 Userspace interface to kernel DRM services -- runtime
ii libdrm2:i386 2.4.100-4 i386 Userspace interface to kernel DRM services -- runtime
-- kernel:
$ dpkg -l linux-image-5.5* | grep ^ii
ii linux-image-5.5.0-rc5-amd64 5.5~rc5-1~exp1 amd64 Linux 5.5-rc5 for 64-bit PCs (signed)
$ uname -a
Linux Nesbitt 5.5.0-rc5-amd64 #1 SMP Debian 5.5~rc5-1~exp1 (2020-01-06) x86_64 GNU/Linux
-- Linux distribution:
$ cat /etc/debian_version
bullseye/sid
-- Machine or mobo model: Lenovo Yoga C940-14IIL
-- Display connector:
$ xrandr | grep " connected"
eDP-1 connected primary 3840x2160+0+0 (normal left inverted right x axis y axis) 309mm x 174mm
Reproducing steps:
Play a game in steam (currently easily triggered with Rimworld). Crash happens anytime between 5 and 15 minutes of play time.
Additional info: from dmesg:
[ 706.762555] perf: interrupt took too long (2524 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 878.147587] perf: interrupt took too long (3241 > 3155), lowering kernel.perf_event_max_sample_rate to 61500
[ 1091.099151] perf: interrupt took too long (4106 > 4051), lowering kernel.perf_event_max_sample_rate to 48500
[ 1325.609741] perf: interrupt took too long (5182 > 5132), lowering kernel.perf_event_max_sample_rate to 38500
[ 1602.725312] perf: interrupt took too long (6736 > 6477), lowering kernel.perf_event_max_sample_rate to 29500
[ 1612.531293] perf: interrupt took too long (8769 > 8420), lowering kernel.perf_event_max_sample_rate to 22750
[ 1612.916769] Asynchronous wait on fence i915:cinnamon[1250]:1643a timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
[ 1613.840431] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 1616.132812] i915 0000:00:02.0: GPU HANG: ecode 11:1:0x86d7fffd, in cinnamon [1250], stopped heartbeat on rcs0
[ 1616.132872] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1616.132873] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1616.132873] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1616.132873] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[ 1616.132899] GPU crash dump saved to /sys/class/drm/card0/error
[ 1616.237695] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
[ 1616.240203] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 1616.246717] i915 0000:00:02.0: Resetting chip for stopped heartbeat on rcs0
[ 1616.449034] i915 0000:00:02.0: Failed to reset chip
[ 1619.235971] perf: interrupt took too long (12434 > 10961), lowering kernel.perf_event_max_sample_rate to 16000
[ 1624.484008] broken atomic modeset userspace detected, disabling atomic
[ 1626.701976] show_signal_msg: 15 callbacks suppressed
[ 1626.702056] steam[1992]: segfault at 17 ip 00000000ea71a81f sp 00000000ff96d270 error 6 in steamclient.so[ea281000+190a000]
[ 1626.702196] Code: 47 14 89 44 24 2c 89 41 08 8d 04 1d b4 30 00 00 e8 b6 01 c9 ff 89 c6 8b 00 85 c0 0f 84 6a 01 00 00 25 ff ff ff 1f 89 c5 31 c0 <f0> 0f b1 2f 85 c0 89 c1 89 c2 0f 84 88 00 00 00 25 ff ff ff 1f 39
[ 1626.984687] steam[1992]: segfault at 17 ip 00000000ea71a81f sp 00000000ff96d270 error 6 in steamclient.so[ea281000+190a000]
[ 1626.985066] Code: 47 14 89 44 24 2c 89 41 08 8d 04 1d b4 30 00 00 e8 b6 01 c9 ff 89 c6 8b 00 85 c0 0f 84 6a 01 00 00 25 ff ff ff 1f 89 c5 31 c0 <f0> 0f b1 2f 85 c0 89 c1 89 c2 0f 84 88 00 00 00 25 ff ff ff 1f 39
[ 1635.525918] perf: interrupt took too long (15684 > 15542), lowering kernel.perf_event_max_sample_rate to 12750
[ 1784.579838] perf: interrupt took too long (22575 > 19605), lowering kernel.perf_event_max_sample_rate to 8750
The file /sys/class/drm/card0/error is zero bytes (after reboot). I'll try to check its size after an actual crash.