Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
GPU crashes multiple times per day when an external display is connected (HDMI or DP).
I haven't observed similar hangs when the external display is not connected.
OS:
Fedora 33
kernel 5.11.0-0.rc6.141.vanilla.1.fc33.x86_64
Reproducible also on Fedora 32, and F33 with the default kernel (v5.10).
See also #3042 (closed), as that looks identical (f33, v5.10.11).
Yes, it looks identical. There are also two other reports which look very similar (Lenovo ThinkPad P50 + external display) #2739#2668
Do you have any older kernels to test?
This is not a new issue, my GPU hanged for the first time probably after upgrade to Fedora 31 (kernel v5.3). The same problem existed on Fedora 32 (kernel v5.6).
Yesterday I upgraded to 5.11.0-0.rc6 and this bug is still reproducible, but it looks that it is not so frequent as on v5.10.
If you want I can downgrade the kernel and I can check if this is reproducible with a given kernel version. Let me know what you need.
[ 2521.637602] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out[ 2521.637610] i915 0000:00:02.0: [drm] Xorg[4350] context reset due to GPU hang[ 2521.643954] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [4350][ 2521.643956] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.[ 2521.643957] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.[ 2521.643957] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.[ 2521.643957] drm/i915 developers can then reassign to the right component if it's not a kernel issue.[ 2521.643958] The GPU crash dump is required to analyze GPU hangs, so please always attach it.[ 2521.643958] GPU crash dump saved to /sys/class/drm/card1/error[ 2521.817814] fbcon: Taking over console[ 2521.818474] Console: switching to colour frame buffer device 240x67[ 2534.052433] Asynchronous wait on fence 0000:00:02.0:kwin_x11[5992]:128c4 timed out (hint:intel_atomic_commit_ready [i915])[ 2536.613556] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out[ 2536.613564] i915 0000:00:02.0: [drm] Xorg[4350] context reset due to GPU hang[ 2536.620088] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dfbfff, in Xorg [4350]
[ 6626.579743] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out[ 6626.579755] i915 0000:00:02.0: [drm] Xorg[1584] context reset due to GPU hang[ 6626.589290] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [1584][ 6626.589294] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.[ 6626.589295] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.[ 6626.589296] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.[ 6626.589297] drm/i915 developers can then reassign to the right component if it's not a kernel issue.[ 6626.589298] The GPU crash dump is required to analyze GPU hangs, so please always attach it.[ 6626.589299] GPU crash dump saved to /sys/class/drm/card0/error[ 6626.759667] fbcon: Taking over console[ 6626.770171] Console: switching to colour frame buffer device 240x67[ 6630.610731] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out[ 6630.610739] i915 0000:00:02.0: [drm] Xorg[1584] context reset due to GPU hang[ 6630.615348] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dfbfff, in Xorg [1584][ 6690.450482] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out[ 6690.450491] i915 0000:00:02.0: [drm] Xorg[1584] context reset due to GPU hang[ 6690.454791] i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in Xorg [1584]
It's not really a "fix" but a workaround. But wondering if you have luck. So far I havnt had an issue, but im measuring only in hours, while before I was seeing the issue every 3 to 5 minutes. Wondering if it helps you too.
I switched to NVIDIA GPU and so far it works fine. I haven't tried this before because 2 years ago nouveau was broken with my GPU, but now it works surprisingly well, even with an external display. Thanks.
Cool. Yea, I'm not doing anything too graphics intensive on this machine, so I was hoping to use "Discrete Only" with wayland/nouveau. However, I did end up installing Nvidia drivers to see if there is a noticable difference, and it does feel a bit smoother, so I'm going to stick with xorg/nvidia on this P50 for now.
I'm on a i7-4770S with F34 using the IGD with kernel 5.11.0-156.fc34.x86_64 and Wayland, it's been working. With the kernel-5.10.17-200.fc33 on F34 works as well. But with the kernel-5.11.1-300.fc34 graphic jitters everywhere.
I am experiencing the same. It happens when the HDMI is attached and used. Doing compute-intensive stuff such as compiling, gaming, or watching 4k videos seems to trigger it. It happens both with Wayland and Xorg.
# dmidecode -t 2# dmidecode 3.3Getting SMBIOS data from sysfs.SMBIOS 3.0.0 present.Handle 0x0002, DMI type 2, 15 bytesBase Board Information Manufacturer: Notebook Product Name: N15_N17RF1 Version: Not Applicable Serial Number: Not Applicable Asset Tag: Tag 12345 Features: Board is a hosting board Board is replaceable Location In Chassis: Not Applicable Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0
My BIOS
# dmidecode -t bios# dmidecode 3.3Getting SMBIOS data from sysfs.SMBIOS 3.0.0 present.Handle 0x0000, DMI type 0, 24 bytesBIOS Information Vendor: American Megatrends Inc. Version: 1.05.04 Release Date: 06/08/2016 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 5 MB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 5.11Handle 0x0029, DMI type 13, 22 bytesBIOS Language Information Language Description Format: Long Installable Languages: 1 en|US|iso8859-1 Currently Installed Language: en|US|iso8859-1
My CPU
# dmidecode -t 4# dmidecode 3.3Getting SMBIOS data from sysfs.SMBIOS 3.0.0 present.Handle 0x0017, DMI type 4, 48 bytesProcessor Information Socket Designation: U3E1 Type: Central Processor Family: Core i7 Manufacturer: Intel(R) Corporation ID: E3 06 05 00 FF FB EB BF Signature: Type 0, Family 6, Model 94, Stepping 3 Flags: FPU (Floating-point unit on-chip) VME (Virtual mode extension) DE (Debugging extension) PSE (Page size extension) TSC (Time stamp counter) MSR (Model specific registers) PAE (Physical address extension) MCE (Machine check exception) CX8 (CMPXCHG8 instruction supported) APIC (On-chip APIC hardware supported) SEP (Fast system call) MTRR (Memory type range registers) PGE (Page global enable) MCA (Machine check architecture) CMOV (Conditional move instruction supported) PAT (Page attribute table) PSE-36 (36-bit page size extension) CLFSH (CLFLUSH instruction supported) DS (Debug store) ACPI (ACPI supported) MMX (MMX technology supported) FXSR (FXSAVE and FXSTOR instructions supported) SSE (Streaming SIMD extensions) SSE2 (Streaming SIMD extensions 2) SS (Self-snoop) HTT (Multi-threading) TM (Thermal monitor supported) PBE (Pending break enabled) Version: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz Voltage: 1.1 V External Clock: 100 MHz Max Speed: 8300 MHz Current Speed: 3100 MHz Status: Populated, Enabled Upgrade: Other L1 Cache Handle: 0x0014 L2 Cache Handle: 0x0015 L3 Cache Handle: 0x0016 Serial Number: To Be Filled By O.E.M. Asset Tag: To Be Filled By O.E.M. Part Number: To Be Filled By O.E.M. Core Count: 4 Core Enabled: 4 Thread Count: 8 Characteristics: 64-bit capable Multi-Core Hardware Thread Execute Protection Enhanced Virtualization Power/Performance Control
I thought my Nvidia card and/or drivers was causing it, so I tried both Nvidia and Nouveau but neither fixes it. Also tried i915.enable_rc6=0 cmdline but this doesn't help as well. Unfortunately, I think my BIOS does not have a discrete graphics switch.
If there is other information that you require/assist with debugging let me know. I will happily help fix this annoying bug.
No discrete GPU, I am only using the onboard graphcs that came with the i9-10900K with the i915 driver available in Kernel 5.11.0-7620-generic
The primary purpose of this workstation is cloud engineering and at the time of the crash, minikube, chrome and IntelliJ IDEA were running.
Here's the stack trace:
8/24/21 2:11 AM Asynchronous wait on fence 0000:00:02.0:kwin_x11[1788004]:1b128e0 timed out (hint:intel_atomic_commit_ready [i915])8/24/21 2:11 AM i915 0000 0:02.0: [drm] Resetting rcs0 for preemption time out8/24/21 2:11 AM i915 0000 0:02.0: [drm] Xorg[1787042] context reset due to GPU hang8/24/21 2:11 AM i915 0000 0:02.0: [drm] GPU HANG: ecode 9:1:84dfbffc, in Xorg [1787042]8/24/21 2:11 AM i915 0000 0:02.0: [drm] Resetting rcs0 for preemption time out8/24/21 2:11 AM i915 0000 0:02.0: [drm] Xorg[1787042] context reset due to GPU hang8/24/21 2:11 AM i915 0000 0:02.0: [drm] GPU HANG: ecode 9:1:85dfffff, in Xorg [1787042]8/24/21 2:11 AM i915 0000 0:02.0: [drm] Resetting rcs0 for preemption time out8/24/21 2:11 AM i915 0000 0:02.0: [drm] Xorg[1787042] context reset due to GPU hang8/24/21 2:11 AM i915 0000 0:02.0: [drm] GPU HANG: ecode 9:1:87cabff2, in Xorg [1787042]