i915 GPU hangs ecode 9:1:0x00000000

changed the description

added GPU hang label

I'm having the same issue since I upgraded to Linux 5.4. I am able to reproduce this issue by opening a document in xournal++ and scrolling through the document until the gpu decides to hang.

Previous kernels do not have this issue.

I'm seeing this with 5.4.2-300.fc31.x86_64 on Intel(R) Core(TM) i7-6500U

00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Device [103c:81a0]

It's seems to be a really hard crash, no tty, no ssh. Happened within the first hour of running 5.4.2, but I haven't reproduced a second time yet.

Downstream bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1780800

dmesg.txt

Happened again. This time I was able to capture /sys/class/drm/card0/error via ssh.

drmcard0error.txt

dmesg.txt

It does look like a hard crash, however the system still responds to interrupts caused by pressing the power button and suspends. I can't switch to another tty and the fan of my laptop gets very loud.

Here's another GPU crash dump in case it would be useful.

GPU Crash Dump

Kernel messages in journalctl

For the crashes I had before were of multiple type:

No external screens were connected to the laptop, screen froze but power button worked so I could power off.
External screens connected, only one froze (using displaylink-evdi). For this inputs on other screen (mouse/keyboard) were working, when moved mouse to frozen screen inputs stopped working. Could only recover by power cycle.
External screens connected, all screens froze, no inputs whatsoever. However this one recovered by itself, this is which's crashdump is attached to the issue.

Had another hang after updating to kernel 5.4.2 I'm running wayland btw.

hang happened when multiple screens were plugged in. Screen that had video running froze. When I kill -9 'd the video process the screen resumed. When spawned the same process again the screen froze again.

gpudump2.txt

Dec 07 09:53:33 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:53:33 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 07 09:53:33 prometheus kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Dec 07 09:53:33 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 07 09:53:33 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
...
Dec 07 09:55:45 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:55:49 prometheus kernel: Asynchronous wait on fence i915:sway[1309]:318f9c timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 09:55:53 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:55:55 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:55:57 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:55:59 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:01 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:03 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:05 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:07 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:09 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:11 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:13 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:15 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 09:56:17 prometheus systemd-logind[944]: Power key pressed.

Edit: Two other hangs:

Dec 07 11:22:11 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:4d01a timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 11:22:25 prometheus kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Dec 07 11:22:25 prometheus kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Dec 07 11:22:25 prometheus kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Dec 07 11:22:25 prometheus kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Dec 07 11:22:25 prometheus kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Dec 07 11:22:25 prometheus kernel: GPU crash dump saved to /sys/class/drm/card0/error
Dec 07 11:22:25 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

crashdump3.txt

For this one I tried to unplug/plug back the external screens, didn't help. Also couldn't collect crashdump, didn't react to power button either, had to kill power.

Dec 07 12:02:53 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:02:53 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 07 12:02:53 prometheus kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Dec 07 12:02:53 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 07 12:02:53 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 07 12:02:54 prometheus kernel: [drm:intel_mst_disable_dp [i915]] *ERROR* failed to update payload -22
Dec 07 12:03:01 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:05 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a670 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 12:03:05 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a674 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 12:03:05 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a670 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 12:03:07 prometheus kernel: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
Dec 07 12:03:09 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:11 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:13 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:15 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:17 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:19 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a674 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 12:03:19 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:19 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a670 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Dec 07 12:03:21 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:23 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:25 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:27 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:29 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:31 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:33 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:35 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:37 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:39 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:41 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:43 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:45 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:47 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:49 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:51 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:53 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:55 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 07 12:03:57 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

Had 2 more crashes, I ended up downgrading the kernel to 5.3.13

Edit: well, it was stable until now, facing hang with 5.3.13 as well.

dumpX.txt

I am having the same issue with kernel 5.3.12, xorg-server 1.20.5 and xf86-video-intel 1:2.99.917 while being on a Arch based system.

My processor is an i7-7700HQ and the device also contains a GTX 1050ti mobile but this device is not enabled.

i915-rcs0-hang.dump

Same issue here, Linux darkstar.example.net 5.4.2 #1 (moved) SMP Wed Dec 4 18:12:20 CST 2019 x86_64 Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz GenuineIntel GNU/Linux mesa-19.2.7

[ 224.178210] i915 0000:00:02.0: GPU HANG: ecode 4:1:0x9ffdfeff, in jalv.gtk [2914], hang on rcs0 [ 224.178212] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 224.178213] Please file a new bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 224.178213] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 224.178214] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [ 224.178215] GPU crash dump saved to /sys/class/drm/card0/error [ 224.178279] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 236.208817] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 244.208824] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 1142.191816] i915 0000:00:02.0: Resetting chip for hang on rcs0 [ 1150.191820] i915 0000:00:02.0: Resetting chip for hang on rcs0

Can't attach file, here the haed eoff the crash-dump: GPU HANG: ecode 4:1:0x9eedfeff, in jalv.gtk [2658], hang on rcs0 Kernel: 5.4.2 x86_64 Driver: 20190822 Time: 1575544661 s 161104 us Boottime: 131 s 172133 us Uptime: 118 s 421572 us Epoch: 4294792001 jiffies (1000 HZ) Capture: 4294798017 jiffies; 1278428 ms ago, 6016 ms after epoch Active process (on ring rcs0): jalv.gtk [2658] Reset count: 0 Suspend count: 0 Platform: GM45 Subplatform: 0x0 PCI ID: 0x2a42 PCI Revision: 0x07 PCI Subsystem: 17aa:213a IOMMU enabled?: 0 GT awake: yes RPM wakelock: yes PM suspended: no EIR: 0x00000000 IER: 0x02028053 PGTBL_ER: 0x00000000 FORCEWAKE: 0x00000000 DERRMR: 0x00000000 CCID: 0x00000000 fence[0] = ebf000009b90ad fence[1] = 22760000227500d fence[2] = 1f8100001f7f00d fence[3] = 2a5400002a1500d fence[4] = 229d0000229801d fence[5] = 1f3f00001f3d00d fence[6] = 22a3000022a000d fence[7] = 9a8000009a800d fence[8] = 227d0000227d00d fence[9] = 25bd0000257e00d fence[10] = 229e0000229e00d fence[11] = 1f9300001f9201d fence[12] = 1f4400001f4300d fence[13] = 229f0000229f00d fence[14] = 2301000022c200d fence[15] = 1f4a00001f4901d rcs0 command stream: IDLE?: no START: 0x00005000 HEAD: 0x06e03d98 [0x00003d28] TAIL: 0x00000110 [0x00003d98, 0x00003db0] CTL: 0x00003001 MODE: 0x00000040 HWS: 0x00002000 ACTHD: 0x00000000 0093e314 IPEIR: 0x00000000 IPEHR: 0x60020100 INSTDONE: 0xfeefffff SC_INSTDONE: 0xbfffffd8 batch: [0x00000000_0093e000, 0x00000000_00943000] BBADDR: 0x00000000_0093e313 BB_STATE: 0x000000a0 INSTPS: 0x0011f02e INSTPM: 0x00000000 FADDR: 0x00000000 0093e4c0 ring->head: 0x00003d10 ring->tail: 0x00000110 hangcheck timestamp: 0ms (4294792001; epoch) engine reset count: 0 Active context: jalv.gtk[2658] hw_id 0, prio 0, guilty 0 active 0

I get these hangs probably 3-4x a day. I'm using a Thinkpad X1C 6th gen with Intel UHD Graphics 620 and Intel i7-8550U (8) @ 4.000GHz. I'm running kernel 5.4.1 on NixOS, X server 1.20.6.

Dec 09 16:39:30 sol kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Dec 09 16:39:30 sol kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Dec 09 16:39:30 sol kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Dec 09 16:39:30 sol kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Dec 09 16:39:30 sol kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Dec 09 16:39:30 sol kernel: GPU crash dump saved to /sys/class/drm/card0/error
Dec 09 16:39:30 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 09 16:39:30 sol kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 09 16:39:30 sol kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Dec 09 16:39:30 sol kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 09 16:39:30 sol kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Dec 09 16:39:36 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 09 16:39:44 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 09 16:39:46 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 09 16:39:46 sol systemd[1]: Started PCSC-Lite daemon.
Dec 09 16:39:46 sol gpg-agent[1871]: scdaemon[2526]: pcsc_list_readers failed: unknown PC/SC error code (0x8010002e)
Dec 09 16:39:48 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Dec 09 16:39:50 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

... message repeats until I press power button.

Look at my post, from the application of the reverts to the indicated patches, the hang happened only once to me - but rather as 'cut', but not totally hanged graphics - system still worked after that, returned to normal work. I have been working for 15 hours and it not happened [before making changes to sources 5.4.2, the system has been crashed 'totally', every 30 min. at high load]

#713 (closed)

mentioned in issue #713 (closed)

added Community label

I'm also having this problem (2 or 3 times a day) where the X session hangs with kernel 5.4.1. In my case, some times it's a hard hang where all I can do is hit the hardware reset button, but most times I can ssh from another system, run systemctl restart display-manager and everything goes back to normal (for a while).

Just in case there's a relation, I should note that I upgraded to kernel 5.4.1 after having the kernel BUG at fs/ext4/inode.c:2721 aka #509 issue with 5.3 kernels. When the patch from https://www.spinics.net/lists/stable/msg340095.html was mentioned in that bug report, I applied it to kernel 5.3.11 and ran it for over 14 days without problems, but then it happened again, so that patch seemed to help but not fix that problem completely. I then updated to kernel 5.4.1 to check if it had a better fix for that problem and so far I didn't see it, but I'm seeing this one.

I attached two captures from /sys/class/drm/card0/error from two different hangs on the same machine just in case it helps. I have two monitors always connected and no other graphic card.

sys_class_drm_card0_error-2019.12.04-13_06.log

sys_class_drm_card0_error-2019.12.10-08_53.log

marked this issue as a duplicate of #673 (closed)

closed

i915 GPU hangs ecode 9:1:0x00000000

Child items ...

Activity

Admin message

Admin message

i915 GPU hangs ecode 9:1:0x00000000

Activity