Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Our infrastructure migration is complete. Please remember to update your SSH remote to point to ssh.gitlab.freedesktop.org; SSH to the old hostname will time out. You should not see any problems apart from that. Please let us know if you do have any other issues.
Had multiple hangs in the last two days, latest:
gpudump.txt
>Dec 06 13:46:06 prometheus kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0Dec 06 13:46:06 prometheus kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.Dec 06 13:46:06 prometheus kernel: Please file a new bug report on bugs.freedesktop.org against DRI -> DRM/IntelDec 06 13:46:06 prometheus kernel: drm/i915 developers can then reassign to the right component if it'snot a kernel issue.Dec 06 13:46:06 prometheus kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.Dec 06 13:46:06 prometheus kernel: GPU crash dump saved to /sys/class/drm/card0/errorDec 06 13:46:06 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
journalctl --since=yesterday | grep crash
> Dec 05 20:28:06 prometheus kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.Dec 05 20:28:06 prometheus kernel: GPU crash dump saved to /sys/class/drm/card0/errorDec 06 10:20:31 prometheus kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.Dec 06 10:20:31 prometheus kernel: GPU crash dump saved to /sys/class/drm/card0/errorDec 06 13:46:06 prometheus kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.Dec 06 13:46:06 prometheus kernel: GPU crash dump saved to /sys/class/drm/card0/error
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
I'm having the same issue since I upgraded to Linux 5.4. I am able to reproduce this issue by opening a document in xournal++ and scrolling through the document until the gpu decides to hang.
It does look like a hard crash, however the system still responds to interrupts caused by pressing the power button and suspends. I can't switch to another tty and the fan of my laptop gets very loud.
Here's another GPU crash dump in case it would be useful.
For the crashes I had before were of multiple type:
No external screens were connected to the laptop, screen froze but power button worked so I could power off.
External screens connected, only one froze (using displaylink-evdi). For this inputs on other screen (mouse/keyboard) were working, when moved mouse to frozen screen inputs stopped working. Could only recover by power cycle.
External screens connected, all screens froze, no inputs whatsoever. However this one recovered by itself, this is which's crashdump is attached to the issue.
Had another hang after updating to kernel 5.4.2
I'm running wayland btw.
hang happened when multiple screens were plugged in. Screen that had video running froze. When I kill -9 'd the video process the screen resumed. When spawned the same process again the screen froze again.
Dec 07 09:53:33 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:53:33 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 07 09:53:33 prometheus kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0Dec 07 09:53:33 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 07 09:53:33 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}...Dec 07 09:55:45 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:55:49 prometheus kernel: Asynchronous wait on fence i915:sway[1309]:318f9c timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 09:55:53 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:55:55 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:55:57 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:55:59 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:01 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:03 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:05 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:07 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:09 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:11 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:13 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:15 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 09:56:17 prometheus systemd-logind[944]: Power key pressed.
Edit:
Two other hangs:
Dec 07 11:22:11 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:4d01a timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 11:22:25 prometheus kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0Dec 07 11:22:25 prometheus kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.Dec 07 11:22:25 prometheus kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/IntelDec 07 11:22:25 prometheus kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.Dec 07 11:22:25 prometheus kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.Dec 07 11:22:25 prometheus kernel: GPU crash dump saved to /sys/class/drm/card0/errorDec 07 11:22:25 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
For this one I tried to unplug/plug back the external screens, didn't help. Also couldn't collect crashdump, didn't react to power button either, had to kill power.
Dec 07 12:02:53 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:02:53 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 07 12:02:53 prometheus kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0Dec 07 12:02:53 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 07 12:02:53 prometheus kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 07 12:02:54 prometheus kernel: [drm:intel_mst_disable_dp [i915]] *ERROR* failed to update payload -22Dec 07 12:03:01 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:05 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a670 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 12:03:05 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a674 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 12:03:05 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a670 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 12:03:07 prometheus kernel: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link trainingDec 07 12:03:09 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:11 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:13 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:15 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:17 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:19 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a674 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 12:03:19 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:19 prometheus kernel: Asynchronous wait on fence i915:sway[1345]:7a670 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])Dec 07 12:03:21 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:23 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:25 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:27 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:29 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:31 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:33 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:35 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:37 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:39 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:41 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:43 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:45 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:47 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:49 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:51 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:53 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:55 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 07 12:03:57 prometheus kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Same issue here, Linux darkstar.example.net 5.4.2 #1 (moved) SMP Wed Dec 4 18:12:20 CST 2019 x86_64 Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz GenuineIntel GNU/Linux
mesa-19.2.7
[ 224.178210] i915 0000:00:02.0: GPU HANG: ecode 4:1:0x9ffdfeff, in jalv.gtk [2914], hang on rcs0
[ 224.178212] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 224.178213] Please file a new bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 224.178213] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 224.178214] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[ 224.178215] GPU crash dump saved to /sys/class/drm/card0/error
[ 224.178279] i915 0000:00:02.0: Resetting chip for hang on rcs0
[ 236.208817] i915 0000:00:02.0: Resetting chip for hang on rcs0
[ 244.208824] i915 0000:00:02.0: Resetting chip for hang on rcs0
[ 1142.191816] i915 0000:00:02.0: Resetting chip for hang on rcs0
[ 1150.191820] i915 0000:00:02.0: Resetting chip for hang on rcs0
Can't attach file, here the haed eoff the crash-dump:
GPU HANG: ecode 4:1:0x9eedfeff, in jalv.gtk [2658], hang on rcs0
Kernel: 5.4.2 x86_64
Driver: 20190822
Time: 1575544661 s 161104 us
Boottime: 131 s 172133 us
Uptime: 118 s 421572 us
Epoch: 4294792001 jiffies (1000 HZ)
Capture: 4294798017 jiffies; 1278428 ms ago, 6016 ms after epoch
Active process (on ring rcs0): jalv.gtk [2658]
Reset count: 0
Suspend count: 0
Platform: GM45
Subplatform: 0x0
PCI ID: 0x2a42
PCI Revision: 0x07
PCI Subsystem: 17aa:213a
IOMMU enabled?: 0
GT awake: yes
RPM wakelock: yes
PM suspended: no
EIR: 0x00000000
IER: 0x02028053
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000000
DERRMR: 0x00000000
CCID: 0x00000000
fence[0] = ebf000009b90ad
fence[1] = 22760000227500d
fence[2] = 1f8100001f7f00d
fence[3] = 2a5400002a1500d
fence[4] = 229d0000229801d
fence[5] = 1f3f00001f3d00d
fence[6] = 22a3000022a000d
fence[7] = 9a8000009a800d
fence[8] = 227d0000227d00d
fence[9] = 25bd0000257e00d
fence[10] = 229e0000229e00d
fence[11] = 1f9300001f9201d
fence[12] = 1f4400001f4300d
fence[13] = 229f0000229f00d
fence[14] = 2301000022c200d
fence[15] = 1f4a00001f4901d
rcs0 command stream:
IDLE?: no
START: 0x00005000
HEAD: 0x06e03d98 [0x00003d28]
TAIL: 0x00000110 [0x00003d98, 0x00003db0]
CTL: 0x00003001
MODE: 0x00000040
HWS: 0x00002000
ACTHD: 0x00000000 0093e314
IPEIR: 0x00000000
IPEHR: 0x60020100
INSTDONE: 0xfeefffff
SC_INSTDONE: 0xbfffffd8
batch: [0x00000000_0093e000, 0x00000000_00943000]
BBADDR: 0x00000000_0093e313
BB_STATE: 0x000000a0
INSTPS: 0x0011f02e
INSTPM: 0x00000000
FADDR: 0x00000000 0093e4c0
ring->head: 0x00003d10
ring->tail: 0x00000110
hangcheck timestamp: 0ms (4294792001; epoch)
engine reset count: 0
Active context: jalv.gtk[2658] hw_id 0, prio 0, guilty 0 active 0
I get these hangs probably 3-4x a day. I'm using a Thinkpad X1C 6th gen with Intel UHD Graphics 620 and Intel i7-8550U (8) @ 4.000GHz. I'm running kernel 5.4.1 on NixOS, X server 1.20.6.
Dec 09 16:39:30 sol kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0Dec 09 16:39:30 sol kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.Dec 09 16:39:30 sol kernel: Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/IntelDec 09 16:39:30 sol kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.Dec 09 16:39:30 sol kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.Dec 09 16:39:30 sol kernel: GPU crash dump saved to /sys/class/drm/card0/errorDec 09 16:39:30 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 09 16:39:30 sol kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 09 16:39:30 sol kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0Dec 09 16:39:30 sol kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 09 16:39:30 sol kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}Dec 09 16:39:36 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 09 16:39:44 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 09 16:39:46 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 09 16:39:46 sol systemd[1]: Started PCSC-Lite daemon.Dec 09 16:39:46 sol gpg-agent[1871]: scdaemon[2526]: pcsc_list_readers failed: unknown PC/SC error code (0x8010002e)Dec 09 16:39:48 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0Dec 09 16:39:50 sol kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Look at my post, from the application of the reverts to the indicated patches, the hang happened only once to me - but rather as 'cut', but not totally hanged graphics - system still worked after that, returned to normal work. I have been working for 15 hours and it not happened [before making changes to sources 5.4.2, the system has been crashed 'totally', every 30 min. at high load]
I'm also having this problem (2 or 3 times a day) where the X session hangs with kernel 5.4.1. In my case, some times it's a hard hang where all I can do is hit the hardware reset button, but most times I can ssh from another system, run systemctl restart display-manager and everything goes back to normal (for a while).
Just in case there's a relation, I should note that I upgraded to kernel 5.4.1 after having the kernel BUG at fs/ext4/inode.c:2721 aka #509 issue with 5.3 kernels. When the patch from https://www.spinics.net/lists/stable/msg340095.html was mentioned in that bug report, I applied it to kernel 5.3.11 and ran it for over 14 days without problems, but then it happened again, so that patch seemed to help but not fix that problem completely. I then updated to kernel 5.4.1 to check if it had a better fix for that problem and so far I didn't see it, but I'm seeing this one.
I attached two captures from /sys/class/drm/card0/error from two different hangs on the same machine just in case it helps. I have two monitors always connected and no other graphic card.