GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7,5.6.2
Mar 09 12:24:33 thinkpad kernel: Asynchronous wait on fence 0000:00:02.0:kwin_x11[1756]:7c6d8 timed out (hint:intel_atomic_commit_ready+0x0/0x54)
Mar 09 12:24:37 thinkpad kernel: i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000
Mar 09 12:24:37 thinkpad kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Mar 09 12:24:37 thinkpad kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Mar 09 12:24:37 thinkpad kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Mar 09 12:24:37 thinkpad kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Mar 09 12:24:37 thinkpad kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Mar 09 12:24:37 thinkpad kernel: GPU crash dump saved to /sys/class/drm/card0/error
Mar 09 12:24:37 thinkpad kernel: i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
Mar 09 12:24:52 thinkpad kernel: i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000
Mar 09 12:24:52 thinkpad kernel: i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
Mar 09 12:25:07 thinkpad kernel: i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000
Mar 09 12:25:07 thinkpad kernel: i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
Mar 09 12:25:22 thinkpad kernel: i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000
Mar 09 12:25:22 thinkpad kernel: i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
Mar 09 12:25:33 thinkpad kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[1550]:431f16 timed out (hint:intel_atomic_commit_ready+0x0/0x54)
See below for /sys/class/drm/card0/error from several different crashes that all go the same way.
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Jason A. Donenfeld changed the description
changed the description
- Binitha Scaria added Community platform: CFL labels
added Community platform: CFL labels
- Author
@ickle - I managed to reproduce and have attached a crash log.
- Jason A. Donenfeld changed title from coffee lake refresh hang on 5.6-rc5 to GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5
changed title from coffee lake refresh hang on 5.6-rc5 to GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5
- Jason A. Donenfeld changed the description
changed the description
- Author
I just got this on 5.6-rc6:
[38644.227357] Asynchronous wait on fence 0000:00:02.0:Xorg[984]:14cb2c timed out (hint:intel_atomic_commit_ready+0x0/0x54) [38649.347905] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [38649.347906] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [38649.347906] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new. [38649.347906] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. [38649.347907] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [38649.347907] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [38649.347907] GPU crash dump saved to /sys/class/drm/card0/error [38649.347911] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [38656.881176] GpuWatchdog[445268]: segfault at 0 ip 000055986c49f477 sp 00007f29f2148790 error 6 in signal-desktop[5598692cc000+53cf000] [38656.881189] Code: 7d b7 00 79 09 48 8b 7d a0 e8 e5 5d d3 fe 8b 83 00 01 00 00 85 c0 0f 84 91 00 00 00 48 8b 03 48 89 df be 01 00 00 00 ff 50 68 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 07 23 70 02 01 80 7d 87 00 [38664.282089] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [38664.282296] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [38674.750794] GpuWatchdog[446027]: segfault at 0 ip 0000561b705a1477 sp 00007faf60ce5790 error 6 in signal-desktop[561b6d3ce000+53cf000] [38674.750806] Code: 7d b7 00 79 09 48 8b 7d a0 e8 e5 5d d3 fe 8b 83 00 01 00 00 85 c0 0f 84 91 00 00 00 48 8b 03 48 89 df be 01 00 00 00 ff 50 68 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 07 23 70 02 01 80 7d 87 00 [38679.214076] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [38679.214082] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [38685.081272] GpuWatchdog[446046]: segfault at 0 ip 000055e69c7d6477 sp 00007ffa9a408790 error 6 in signal-desktop[55e699603000+53cf000] [38685.081286] Code: 7d b7 00 79 09 48 8b 7d a0 e8 e5 5d d3 fe 8b 83 00 01 00 00 85 c0 0f 84 91 00 00 00 48 8b 03 48 89 df be 01 00 00 00 ff 50 68 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 07 23 70 02 01 80 7d 87 00 [38685.960057] signal-desktop[446075]: segfault at 140 ip 00007fddb85cfeb9 sp 00007ffee9a9b630 error 4 in libGLESv2.so[7fddb831d000+2b7000] [38685.960061] Code: 00 05 00 00 5b 41 5c 41 5e 41 5f 5d e9 88 00 00 00 41 89 0e 5b 41 5c 41 5e 41 5f 5d c3 53 48 8b 3d 5c 3e 01 00 e8 ed 00 00 00 <ff> 90 40 01 00 00 48 85 c0 74 1f 48 89 c3 48 8b 00 48 89 df ff 50
GPU HANG: ecode 9:0:00000000 Kernel: 5.6.0-rc6+ x86_64 Driver: 20200114 Time: 1584600941 s 451321 us Boottime: 44652 s 962929 us Uptime: 7805 s 805635 us Capture: 4306472128 jiffies; 48620 ms ago Reset count: 0 Suspend count: 5 Platform: COFFEELAKE Subplatform: 0x0 PCI ID: 0x3e9b PCI Revision: 0x02 PCI Subsystem: 17aa:229f IOMMU enabled?: 1 DMC loaded: yes DMC fw version: 1.4 RPM wakelock: yes PM suspended: no GT awake: yes EIR: 0x00000000 IER: 0x08080000 GTIER[0]: 0x01010101 GTIER[1]: 0x01010101 GTIER[2]: 0x80000070 GTIER[3]: 0x00000101 PGTBL_ER: 0x00000000 FORCEWAKE: 0x00010001 DERRMR: 0x2077efef fence[0] = 00000000 fence[1] = 44d301f040d4003 fence[2] = 00000000 fence[3] = 4063077020c0001 fence[4] = 8463077064c0001 fence[5] = 00000000 fence[6] = 64a307704500001 fence[7] = 00000000 fence[8] = 00000000 fence[9] = 00000000 fence[10] = 00000000 fence[11] = 00000000 fence[12] = 00000000 fence[13] = 00000000 fence[14] = 00000000 fence[15] = 00000000 fence[16] = 00000000 fence[17] = 00000000 fence[18] = 00000000 fence[19] = 00000000 fence[20] = 00000000 fence[21] = 00000000 fence[22] = 00000000 fence[23] = 00000000 fence[24] = 00000000 fence[25] = 00000000 fence[26] = 00000000 fence[27] = 00000000 fence[28] = 00000000 fence[29] = 00000000 fence[30] = 00000000 fence[31] = 00000000 ERROR: 0x00000000 DONE_REG: 0xebfff1ff FAULT_TLB_DATA: 0x00000018 0x4421a0b2 GTT_CACHE_EN: 0xf0007fff GuC firmware: i915/kbl_guc_33.0.0.bin status: RUNNING version: wanted 33.0, found 33.0 uCode: 182528 bytes RSA: 256 bytes HuC firmware: i915/kbl_huc_4.0.0.bin status: RUNNING version: wanted 4.0, found 4.0 uCode: 225664 bytes RSA: 256 bytes global --- GuC log buffer = 0x00000000 000c8000 :cL%-H5f+!S2UDU=!XonTY-tJ64jR+H`;s5J!Bh!"&E!d"6:+[@]*TUK<H(3bba5ul1E$2;\o@Z9#q"q"-J@pd[]!m^G&G%7]DcO%!!!!(G$2;/T*m[DeQ8=,j[YU7!!-/_zzzzzzz'n="!Kpl50nh2(KlR#J3HJmhF6WfhZ!!!8]zzzzzzzzz!Bbc1'I,/\ Num Pipes: 3 Pipe [0]: Power: on SRC: 0eff086f STAT: 00000000 Plane [0]: CNTR: c4042400 STRIDE: 0000001e SURF: 04500000 TILEOFF: 00000000 Cursor [0]: CNTR: 04000027 POS: 05d10569 BASE: 00040000 Pipe [1]: Power: off SRC: 00000000 STAT: 00000000 Plane [1]: CNTR: 00000000 STRIDE: 00000000 SURF: 00000000 TILEOFF: 00000000 Cursor [1]: CNTR: 00000000 POS: 00000000 BASE: 00000000 Pipe [2]: Power: off SRC: 00000000 STAT: 00000000 Plane [2]: CNTR: 00000000 STRIDE: 00000000 SURF: 00000000 TILEOFF: 00000000 Cursor [2]: CNTR: 00000000 POS: 00000000 BASE: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: EDP Power: on CONF: c0000000 HTOTAL: 0f9f0eff HBLANK: 0f9f0eff HSYNC: 0f4f0f2f VTOTAL: 08ad086f VBLANK: 08ad086f VSYNC: 08770872 engines: 47 gen: 9 gt: 2 iommu: enabled memory-regions: 5 page-sizes: 11000 platform: COFFEELAKE ppgtt-size: 48 ppgtt-type: 2 is_mobile: no is_lp: no require_force_probe: no is_dgfx: no has_64bit_reloc: yes gpu_reset_clobbers_display: no has_reset_engine: yes has_fpga_dbg: yes has_global_mocs: no has_gt_uc: yes has_l3_dpf: no has_llc: yes has_logical_ring_contexts: yes has_logical_ring_elsq: no has_logical_ring_preemption: yes has_pooled_eu: no has_rc6: yes has_rc6p: no has_rps: yes has_runtime_pm: yes has_snoop: no has_coherent_ggtt: yes unfenced_needs_alignment: no hws_needs_physical: no cursor_needs_physical: no has_csr: yes has_ddi: yes has_dp_mst: yes has_dsb: no has_dsc: no has_fbc: yes has_gmch: no has_hdcp: yes has_hotplug: yes has_ipc: yes has_modular_fia: no has_overlay: no has_psr: yes overlay_needs_physical: no supports_tv: no slice total: 1, mask=0001 subslice total: 3 slice0: 3 subslices, mask=00000007 slice1: 0 subslices, mask=00000000 slice2: 0 subslices, mask=00000000 EU total: 24 EU per subslice: 8 has slice power gating: no has subslice power gating: no has EU power gating: yes CS timestamp frequency: 12000 kHz slice0: 3 subslice(s) (0x00000007): subslice0: 8 EUs (0xff) subslice1: 8 EUs (0xff) subslice2: 8 EUs (0xff) subslice3: 0 EUs (0x0) slice1: 0 subslice(s) (0x00000000): subslice0: 0 EUs (0x0) subslice1: 0 EUs (0x0) subslice2: 0 EUs (0x0) subslice3: 0 EUs (0x0) slice2: 0 subslice(s) (0x00000000): subslice0: 0 EUs (0x0) subslice1: 0 EUs (0x0) subslice2: 0 EUs (0x0) subslice3: 0 EUs (0x0) Has logical contexts? yes scheduler: 1f i915.vbt_firmware=(null) i915.modeset=-1 i915.lvds_channel_mode=0 i915.panel_use_ssc=-1 i915.vbt_sdvo_panel_type=-1 i915.enable_dc=-1 i915.enable_fbc=1 i915.enable_psr=-1 i915.disable_power_well=1 i915.enable_ips=1 i915.invert_brightness=0 i915.enable_guc=2 i915.guc_log_level=-1 i915.guc_firmware_path=(null) i915.huc_firmware_path=(null) i915.dmc_firmware_path=(null) i915.mmio_debug=1 i915.edp_vswing=0 i915.reset=3 i915.inject_probe_failure=0 i915.fastboot=-1 i915.enable_dpcd_backlight=0 i915.force_probe= i915.fake_lmem_start=0 i915.alpha_support=no i915.enable_hangcheck=yes i915.prefault_disable=no i915.load_detect_test=no i915.force_reset_modeset_test=no i915.error_capture=yes i915.disable_display=no i915.verbose_state_checks=yes i915.nuclear_pageflip=no i915.enable_dp_mst=yes i915.enable_gvt=no
- Author
Happened again on 5.6-rc7. @ickle - how many more of these do you want before you reply? At first I understood the idea of it not being worth your time without the error log. But I've since been able to capture quite a few. Here's yet-another:
GPU HANG: ecode 9:0:00000000 Kernel: 5.6.0-rc7+ x86_64 Driver: 20200114 Time: 1585345279 s 234746 us Boottime: 93752 s 558254 us Uptime: 6147 s 192508 us Capture: 4311936000 jiffies; 46114 ms ago Reset count: 0 Suspend count: 6 Platform: COFFEELAKE Subplatform: 0x0 PCI ID: 0x3e9b PCI Revision: 0x02 PCI Subsystem: 17aa:229f IOMMU enabled?: 1 DMC loaded: yes DMC fw version: 1.4 RPM wakelock: yes PM suspended: no GT awake: yes EIR: 0x00000000 IER: 0x08080000 GTIER[0]: 0x01010101 GTIER[1]: 0x01010101 GTIER[2]: 0x80000070 GTIER[3]: 0x00000101 PGTBL_ER: 0x00000000 FORCEWAKE: 0x00010001 DERRMR: 0x2077efef fence[0] = 65e307704640001 fence[1] = 00000000 fence[2] = 00000000 fence[3] = 44e307702540001 fence[4] = 85a307706600001 fence[5] = 251f01f02120003 fence[6] = 00000000 fence[7] = 460300704504003 fence[8] = 00000000 fence[9] = 00000000 fence[10] = 00000000 fence[11] = 00000000 fence[12] = 00000000 fence[13] = 00000000 fence[14] = 00000000 fence[15] = 00000000 fence[16] = 00000000 fence[17] = 00000000 fence[18] = 00000000 fence[19] = 00000000 fence[20] = 00000000 fence[21] = 00000000 fence[22] = 00000000 fence[23] = 00000000 fence[24] = 00000000 fence[25] = 00000000 fence[26] = 00000000 fence[27] = 00000000 fence[28] = 00000000 fence[29] = 00000000 fence[30] = 00000000 fence[31] = 00000000 ERROR: 0x00000000 DONE_REG: 0xebfff1ff FAULT_TLB_DATA: 0x00000018 0x4421a0b2 GTT_CACHE_EN: 0xf0007fff GuC firmware: i915/kbl_guc_33.0.0.bin status: RUNNING version: wanted 33.0, found 33.0 uCode: 182528 bytes RSA: 256 bytes HuC firmware: i915/kbl_huc_4.0.0.bin status: RUNNING version: wanted 4.0, found 4.0 uCode: 225664 bytes RSA: 256 bytes global --- GuC log buffer = 0x00000000 000c8000 :cL%-H5f+!SDU8Ou":3?ml\eXX&3*2a%rjji+=Em/6F^I?OOZd`[La\1<H(3bba5ul1E$2;\Tm)X-((\sB%cU9do@P5h8N`2meg74J,fQMF`+>?Zg$V^-M-DL*2rGB!!"M>zzzzzzzrNGtrl?Nl6R(VsqVC]nC;q_]9qAQs\!!!!_zzzzzzzzz8dXPA!:hC@ Num Pipes: 3 Pipe [0]: Power: on SRC: 0eff086f STAT: 00000000 Plane [0]: CNTR: c4042400 STRIDE: 0000001e SURF: 04640000 TILEOFF: 00000000 Cursor [0]: CNTR: 04000027 POS: 068b0cd1 BASE: 00080000 Pipe [1]: Power: off SRC: 00000000 STAT: 00000000 Plane [1]: CNTR: 00000000 STRIDE: 00000000 SURF: 00000000 TILEOFF: 00000000 Cursor [1]: CNTR: 00000000 POS: 00000000 BASE: 00000000 Pipe [2]: Power: off SRC: 00000000 STAT: 00000000 Plane [2]: CNTR: 00000000 STRIDE: 00000000 SURF: 00000000 TILEOFF: 00000000 Cursor [2]: CNTR: 00000000 POS: 00000000 BASE: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: EDP Power: on CONF: c0000000 HTOTAL: 0f9f0eff HBLANK: 0f9f0eff HSYNC: 0f4f0f2f VTOTAL: 08ad086f VBLANK: 08ad086f VSYNC: 08770872 engines: 47 gen: 9 gt: 2 iommu: enabled memory-regions: 5 page-sizes: 11000 platform: COFFEELAKE ppgtt-size: 48 ppgtt-type: 2 is_mobile: no is_lp: no require_force_probe: no is_dgfx: no has_64bit_reloc: yes gpu_reset_clobbers_display: no has_reset_engine: yes has_fpga_dbg: yes has_global_mocs: no has_gt_uc: yes has_l3_dpf: no has_llc: yes has_logical_ring_contexts: yes has_logical_ring_elsq: no has_logical_ring_preemption: yes has_pooled_eu: no has_rc6: yes has_rc6p: no has_rps: yes has_runtime_pm: yes has_snoop: no has_coherent_ggtt: yes unfenced_needs_alignment: no hws_needs_physical: no cursor_needs_physical: no has_csr: yes has_ddi: yes has_dp_mst: yes has_dsb: no has_dsc: no has_fbc: yes has_gmch: no has_hdcp: yes has_hotplug: yes has_ipc: yes has_modular_fia: no has_overlay: no has_psr: yes overlay_needs_physical: no supports_tv: no slice total: 1, mask=0001 subslice total: 3 slice0: 3 subslices, mask=00000007 slice1: 0 subslices, mask=00000000 slice2: 0 subslices, mask=00000000 EU total: 24 EU per subslice: 8 has slice power gating: no has subslice power gating: no has EU power gating: yes CS timestamp frequency: 12000 kHz slice0: 3 subslice(s) (0x00000007): subslice0: 8 EUs (0xff) subslice1: 8 EUs (0xff) subslice2: 8 EUs (0xff) subslice3: 0 EUs (0x0) slice1: 0 subslice(s) (0x00000000): subslice0: 0 EUs (0x0) subslice1: 0 EUs (0x0) subslice2: 0 EUs (0x0) subslice3: 0 EUs (0x0) slice2: 0 subslice(s) (0x00000000): subslice0: 0 EUs (0x0) subslice1: 0 EUs (0x0) subslice2: 0 EUs (0x0) subslice3: 0 EUs (0x0) Has logical contexts? yes scheduler: 1f i915.vbt_firmware=(null) i915.modeset=-1 i915.lvds_channel_mode=0 i915.panel_use_ssc=-1 i915.vbt_sdvo_panel_type=-1 i915.enable_dc=-1 i915.enable_fbc=1 i915.enable_psr=-1 i915.disable_power_well=1 i915.enable_ips=1 i915.invert_brightness=0 i915.enable_guc=2 i915.guc_log_level=-1 i915.guc_firmware_path=(null) i915.huc_firmware_path=(null) i915.dmc_firmware_path=(null) i915.mmio_debug=0 i915.edp_vswing=0 i915.reset=3 i915.inject_probe_failure=0 i915.fastboot=-1 i915.enable_dpcd_backlight=0 i915.force_probe= i915.fake_lmem_start=0 i915.alpha_support=no i915.enable_hangcheck=yes i915.prefault_disable=no i915.load_detect_test=no i915.force_reset_modeset_test=no i915.error_capture=yes i915.disable_display=no i915.verbose_state_checks=yes i915.nuclear_pageflip=no i915.enable_dp_mst=yes i915.enable_gvt=no
Edited by Jason A. Donenfeld - Author
CC @bjsprakash - This should have a "severity::critical" tag on it.
Edited by Jason A. Donenfeld - Jason A. Donenfeld mentioned in issue #1507 (closed)
mentioned in issue #1507 (closed)
- Jason A. Donenfeld changed title from GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5 to GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7
changed title from GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5 to GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7
- Jason A. Donenfeld changed the description
changed the description
- bprakash added priority::high severity::critical labels
added priority::high severity::critical labels
- Jason A. Donenfeld changed title from GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7 to GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7,5.6.2
changed title from GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7 to GPU HANG: ecode 9:0:00000000 - coffee lake, 5.6-rc5,rc6,rc7,5.6.2
- Author
@ickle It happened again. How many of these do you need in order to take it seriously?
[19280.824298] Asynchronous wait on fence 0000:00:02.0:kwin_x11[1168]:565c0 timed out (hint:intel_atomic_commit_ready+0x0/0x54) [19285.732156] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19285.732158] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [19285.732159] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new. [19285.732159] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. [19285.732160] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [19285.732160] The GPU crash dump is required to analyze GPU hangs, so please always attach it. [19285.732161] GPU crash dump saved to /sys/class/drm/card0/error [19285.732166] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [19300.668824] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19300.668831] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [19315.598487] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19315.598492] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [19330.745365] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19330.745370] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [19345.678496] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19345.678500] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [19358.693374] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19358.693489] i915 0000:00:02.0: Resetting bcs0 for stopped heartbeat on bcs0 [19360.613352] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19360.613494] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0 [19375.760100] i915 0000:00:02.0: GPU HANG: ecode 9:0:00000000 [19375.760243] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
With the card error being:
GPU HANG: ecode 9:0:00000000 Kernel: 5.6.2+ x86_64 Driver: 20200114 Time: 1586054145 s 459624 us Boottime: 24584 s 856982 us Uptime: 3503 s 379677 us Capture: 4300662912 jiffies; 110070 ms ago Reset count: 0 Suspend count: 2 Platform: COFFEELAKE Subplatform: 0x0 PCI ID: 0x3e9b PCI Revision: 0x02 PCI Subsystem: 17aa:229f IOMMU enabled?: 1 DMC loaded: yes DMC fw version: 1.4 RPM wakelock: yes PM suspended: no GT awake: yes EIR: 0x00000000 IER: 0x08080000 GTIER[0]: 0x01010101 GTIER[1]: 0x01010101 GTIER[2]: 0x80000070 GTIER[3]: 0x00000101 PGTBL_ER: 0x00000000 FORCEWAKE: 0x00010001 DERRMR: 0x2077efef fence[0] = 7fa307706000001 fence[1] = 551b07703500001 fence[2] = 9fdb07707fc0001 fence[3] = e05b0770c040001 fence[4] = 00000000 fence[5] = 00000000 fence[6] = 30b100702fb2003 fence[7] = 34d901f030da003 fence[8] = 00000000 fence[9] = 00000000 fence[10] = 00000000 fence[11] = 00000000 fence[12] = 00000000 fence[13] = 00000000 fence[14] = 00000000 fence[15] = 00000000 fence[16] = 00000000 fence[17] = 00000000 fence[18] = 00000000 fence[19] = 00000000 fence[20] = 00000000 fence[21] = 00000000 fence[22] = 00000000 fence[23] = 00000000 fence[24] = 00000000 fence[25] = 00000000 fence[26] = 00000000 fence[27] = 00000000 fence[28] = 00000000 fence[29] = 00000000 fence[30] = 00000000 fence[31] = 00000000 ERROR: 0x00000000 DONE_REG: 0xebfff1ff FAULT_TLB_DATA: 0x00000018 0x4421a0b2 GTT_CACHE_EN: 0xf0007fff GuC firmware: i915/kbl_guc_33.0.0.bin status: RUNNING version: wanted 33.0, found 33.0 uCode: 182528 bytes RSA: 256 bytes HuC firmware: i915/kbl_huc_4.0.0.bin status: RUNNING version: wanted 4.0, found 4.0 uCode: 225664 bytes RSA: 256 bytes global --- GuC log buffer = 0x00000000 000c8000 :cL%-H5f+!S2UDU=Jco`r_d$BVf*/Yj6l0i/0Lu5^;+*Al3(a<R&f4sVRYn6Rg"DP_=f,[n8!f9T.)@92,NR$3fQ>XQ46lK?!;Cpos$-Pa4Y?I>Il$%prr1U6ca&2G!!!#"zzzzzzzM`/?.T0gd\4C[mjaE*Os<S_V6n%PlG!!!!$zzzzzzzzJ,fQLF_@9u!!#e@ Num Pipes: 3 Pipe [0]: Power: on SRC: 0eff086f STAT: 00000000 Plane [0]: CNTR: c4042400 STRIDE: 0000001e SURF: 07fc0000 TILEOFF: 00000000 Cursor [0]: CNTR: 04000027 POS: 06420b9a BASE: 00040000 Pipe [1]: Power: off SRC: 00000000 STAT: 00000000 Plane [1]: CNTR: 00000000 STRIDE: 00000000 SURF: 00000000 TILEOFF: 00000000 Cursor [1]: CNTR: 00000000 POS: 00000000 BASE: 00000000 Pipe [2]: Power: off SRC: 00000000 STAT: 00000000 Plane [2]: CNTR: 00000000 STRIDE: 00000000 SURF: 00000000 TILEOFF: 00000000 Cursor [2]: CNTR: 00000000 POS: 00000000 BASE: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: A Power: off CONF: 00000000 HTOTAL: 00000000 HBLANK: 00000000 HSYNC: 00000000 VTOTAL: 00000000 VBLANK: 00000000 VSYNC: 00000000 CPU transcoder: EDP Power: on CONF: c0000000 HTOTAL: 0f9f0eff HBLANK: 0f9f0eff HSYNC: 0f4f0f2f VTOTAL: 08ad086f VBLANK: 08ad086f VSYNC: 08770872 engines: 47 gen: 9 gt: 2 iommu: enabled memory-regions: 5 page-sizes: 11000 platform: COFFEELAKE ppgtt-size: 48 ppgtt-type: 2 is_mobile: no is_lp: no require_force_probe: no is_dgfx: no has_64bit_reloc: yes gpu_reset_clobbers_display: no has_reset_engine: yes has_fpga_dbg: yes has_global_mocs: no has_gt_uc: yes has_l3_dpf: no has_llc: yes has_logical_ring_contexts: yes has_logical_ring_elsq: no has_logical_ring_preemption: yes has_pooled_eu: no has_rc6: yes has_rc6p: no has_rps: yes has_runtime_pm: yes has_snoop: no has_coherent_ggtt: yes unfenced_needs_alignment: no hws_needs_physical: no cursor_needs_physical: no has_csr: yes has_ddi: yes has_dp_mst: yes has_dsb: no has_dsc: no has_fbc: yes has_gmch: no has_hdcp: yes has_hotplug: yes has_ipc: yes has_modular_fia: no has_overlay: no has_psr: yes overlay_needs_physical: no supports_tv: no slice total: 1, mask=0001 subslice total: 3 slice0: 3 subslices, mask=00000007 slice1: 0 subslices, mask=00000000 slice2: 0 subslices, mask=00000000 EU total: 24 EU per subslice: 8 has slice power gating: no has subslice power gating: no has EU power gating: yes CS timestamp frequency: 12000 kHz slice0: 3 subslice(s) (0x00000007): subslice0: 8 EUs (0xff) subslice1: 8 EUs (0xff) subslice2: 8 EUs (0xff) subslice3: 0 EUs (0x0) slice1: 0 subslice(s) (0x00000000): subslice0: 0 EUs (0x0) subslice1: 0 EUs (0x0) subslice2: 0 EUs (0x0) subslice3: 0 EUs (0x0) slice2: 0 subslice(s) (0x00000000): subslice0: 0 EUs (0x0) subslice1: 0 EUs (0x0) subslice2: 0 EUs (0x0) subslice3: 0 EUs (0x0) Has logical contexts? yes scheduler: 1f i915.vbt_firmware=(null) i915.modeset=-1 i915.lvds_channel_mode=0 i915.panel_use_ssc=-1 i915.vbt_sdvo_panel_type=-1 i915.enable_dc=-1 i915.enable_fbc=1 i915.enable_psr=-1 i915.disable_power_well=1 i915.enable_ips=1 i915.invert_brightness=0 i915.enable_guc=2 i915.guc_log_level=-1 i915.guc_firmware_path=(null) i915.huc_firmware_path=(null) i915.dmc_firmware_path=(null) i915.mmio_debug=0 i915.edp_vswing=0 i915.reset=3 i915.inject_probe_failure=0 i915.fastboot=-1 i915.enable_dpcd_backlight=0 i915.force_probe= i915.fake_lmem_start=0 i915.alpha_support=no i915.enable_hangcheck=yes i915.prefault_disable=no i915.load_detect_test=no i915.force_reset_modeset_test=no i915.error_capture=yes i915.disable_display=no i915.verbose_state_checks=yes i915.nuclear_pageflip=no i915.enable_dp_mst=yes i915.enable_gvt=no
- Reporter
Hi Reporter, please confirm if this issue is still seen on latest kernel also.
If yes, provide the steps to reproduce it for further debugging.
- Mahesh Meena removed severity::critical label
removed severity::critical label
- Mahesh Meena added severity::major label
added severity::major label
- Reporter
Reduced the severity:critical to severity:major. As not seen any latest repro update from reporter.
- Author
Probably shouldn't be demoted to major. I've seen this in 5.10. Can you re-up the severity?
- Author
Jan 28 21:51:55 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out Jan 28 21:51:55 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in plasmashell [969] Jan 28 21:51:55 thinkpad kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Jan 28 21:51:55 thinkpad kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new. Jan 28 21:51:55 thinkpad kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. Jan 28 21:51:55 thinkpad kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue. Jan 28 21:51:55 thinkpad kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it. Jan 28 21:51:55 thinkpad kernel: GPU crash dump saved to /sys/class/drm/card0/error Jan 28 21:51:55 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for CS error Jan 28 21:51:55 thinkpad kernel: i915 0000:00:02.0: [drm] plasmashell[969] context reset due to GPU hang Jan 28 21:51:55 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:d9d864dd, in plasmashell [969] Jan 28 21:51:56 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out Jan 28 21:51:56 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for CS error Jan 28 21:51:56 thinkpad kernel: i915 0000:00:02.0: [drm] plasmashell[969] context reset due to GPU hang Jan 28 21:51:56 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in plasmashell [969] Jan 28 21:51:56 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:d9d8e4dd, in plasmashell [969] Jan 28 21:51:57 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out Jan 28 21:51:57 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for CS error Jan 28 21:51:57 thinkpad kernel: i915 0000:00:02.0: [drm] plasmashell[969] context reset due to GPU hang Jan 28 21:51:57 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in plasmashell [969] Jan 28 21:51:57 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:d9d8e4dd, in plasmashell [969] Jan 28 21:51:57 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out Jan 28 21:51:58 thinkpad kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for CS error Jan 28 21:51:58 thinkpad kernel: i915 0000:00:02.0: [drm] plasmashell[969] context reset due to GPU hang Jan 28 21:51:58 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in plasmashell [969] Jan 28 21:51:58 thinkpad kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:d9d8e4dd, in plasmashell [969]
- Mahesh Meena added severity::critical label
added severity::critical label
- Mahesh Meena removed severity::major label
removed severity::major label
- Reporter
Can you please share me steps to reproduce issue? When are you observing GPU HANG?
- Author
It happens somewhat randomly, and I can't induce it, though it's frequent enough that I'll hit it eventually.
- Reporter
@zx2c4 Are you running anything or is it in IDLE state? Can you please share in details like what are the tasks / application / tests you are performing and when are you observing an issue? Can you please share BIOS details, OS version and driver details?
- Reporter
Hi @zx2c4
Can you please provide me details as I asked in my previous comment? Have you checked behavior with latest driver?
- Reporter
Hi @zx2c4,
Can you please provide the details and confirm whether you are able to see this issue any more? I will close this case in case if you do not respond in two days. Requesting you to kindly update on this issue.