Engine reset failed on 0:0 -> GPU hang
I was replicating this issue while playing Minecraft, and it occurred around 5 minutes into playing every single time I'd play all of the sudden after updating my system.
This is an Acer Swift 3
uname -a: Linux spelling-is-fun 6.8.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 16 Mar 2024 17:15:35 +0000 x86_64 GNU/Linux
lspci -vnn -d :*:0300
:
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] [8086:46a6] (rev 0c) (prog-if 00 [VGA controller])
Subsystem: Acer Incorporated [ALI] Alder Lake-P GT2 [Iris Xe Graphics] [1025:1612]
Flags: bus master, fast devsel, latency 0, IRQ 161, IOMMU group 1
Memory at 601f000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=256M]
I/O ports at 3000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915, xe
Kernel logs show:
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: GUC: Engine reset failed on 0:0 (rcs0) because 0x00000000
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:84dffffb, in Render thread [3775]
Mar 23 09:12:30 spelling-is-fun kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 23 09:12:30 spelling-is-fun kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Mar 23 09:12:30 spelling-is-fun kernel: Please see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
Mar 23 09:12:30 spelling-is-fun kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 23 09:12:30 spelling-is-fun kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Mar 23 09:12:30 spelling-is-fun kernel: GPU crash dump saved to /sys/class/drm/card1/error
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: Resetting chip for GuC failed to reset engine mask=0x1
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] *ERROR* GT0: rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] Render thread[3775] context reset due to GPU hang
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.20.0
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
Mar 23 09:12:30 spelling-is-fun kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
Mar 23 09:12:30 spelling-is-fun plasmashell[1438]: QRhiGles2: Context is lost.
Mar 23 09:12:30 spelling-is-fun plasmashell[1438]: Graphics device lost, cleaning up scenegraph and releasing RHI
Reading the error dump after a reboot simply shows "No error state collected." If my system crashes how am I supposed to read it?