GPU Hang
Hello,
While normally using my system it suddenly started hanging. I was able to log onto it via SSH from a secondary system, checked the dmesg
, found out that the Intel integrated GPU apparently hung up (?). I was able to switch to a virtual text console, where I could pkill compton
, which then brought the whole system back to live. See the dmesg output here:
[10720.872190] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
[10720.872191] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[10720.872192] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[10720.872192] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[10720.872192] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[10720.872193] GPU crash dump saved to /sys/class/drm/card0/error
[10720.873196] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[10720.873922] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[10720.874101] i915 0000:00:02.0: Resetting chip for hang on rcs0
[10720.875853] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[10720.876589] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[10720.878335] [drm] GuC communication enabled
[10720.878375] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled
[10720.878376] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.0 authenticated:yes
[10724.711869] Asynchronous wait on fence i915:compton[1677]:48bd0 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
[10726.845096] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
(that line repeats about 100 times, 1 each 2 seconds
...
[10788.712015] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[10790.845464] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
[10790.845605] i915 0000:00:02.0: Resetting chip for hang on rcs0
[10790.847102] [drm] GuC communication enabled
[10790.847199] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled
[10790.847200] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.0 authenticated:yes
[10798.738804] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[10800.872164] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[10801.938836] Asynchronous wait on fence i915:compton[1677]:48bd4 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
[10802.792183] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
(the block above appears 3 more times)
...
[10990.739604] i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
[10990.739739] i915 0000:00:02.0: Resetting chip for hang on rcs0
[10990.741718] [drm] GuC communication enabled
[10990.741760] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled
[10990.741761] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.0 authenticated:yes
at which point it started working again, I was able to restart compton and the system was back to normal. I saved the file mentioned in the crash report above, find it attached: error.txt
If you need more information, I'm happy to supply.
CPU: Intel(R) Core(TM) i7-8750H CPU
DM/WM: Compton, with i3