GPU HANG: ecode 9:1:0x8fa44a56 - coffee lake
Hello,
I observe these errors on my system with the 5.5.7-gentoo kernel:
[28534.717906] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[28534.718665] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[28542.780879] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[28542.781615] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[28545.019893] Asynchronous wait on fence i915:X[4824]:e3c6e timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
[28545.788866] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[28545.789598] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[28548.739454] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x8fa44a56, in qutebrowser [4839], stopped heartbeat on rcs0
[28548.739456] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[28548.739457] Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
[28548.739457] Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
[28548.739458] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[28548.739459] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
[28548.739460] GPU crash dump saved to /sys/class/drm/card0/error
[28548.842072] i915 0000:00:02.0: Resetting rcs0 for stopped heartbeat on rcs0
[28548.842814] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[28548.843829] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[28548.844565] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[28548.844623] i915 0000:00:02.0: Resetting chip for stopped heartbeat on rcs0
[28548.947035] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
[28548.947770] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
When this happens graphics freezes for some time (less than a minute). After that it seems to continue operating normally.
This happens from time to time, I can't provide steps for reproducing. But it may be related to qutebrowser somehow. The more I use qutebrowser (open new pages and so on), the more chance these errors occur.
I observe a similar (but not exactly the same) problem with the 5.6.0-gentoo kernel: #1689 (closed)
5.5.7 seems more stable. For example, the window manager is still operable after a graphic freeze on 5.5.7. On 5.6.0, graphics goes crazy after a GPU hang all the time.
System information:
# inxi -Fzm
System: Host: isaak Kernel: 5.5.7-gentoo x86_64 bits: 64 Desktop: dwm 6.2 Distro: Gentoo Base System release 2.6
Machine: Type: Desktop Mobo: ASRock model: Z390 Taichi serial: <filter> UEFI: American Megatrends v: P1.30 date: 09/05/2018
Memory: RAM: total: 31.09 GiB used: 13.77 GiB (44.3%)
Array-1: capacity: 64 GiB slots: 4 EC: None
Device-1: ChannelA-DIMM0 size: No Module Installed
Device-2: ChannelA-DIMM1 size: 16 GiB speed: 2667 MT/s
Device-3: ChannelB-DIMM0 size: No Module Installed
Device-4: ChannelB-DIMM1 size: 16 GiB speed: 2667 MT/s
CPU: Topology: 8-Core model: Intel Core i7-9700K bits: 64 type: MCP L2 cache: 12.0 MiB
Speed: 800 MHz min/max: 800/4900 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 801
Graphics: Device-1: Intel UHD Graphics 630 driver: i915 v: kernel
Display: server: X.Org 1.20.7 driver: intel resolution: 1920x1200~60Hz
OpenGL: renderer: Mesa DRI Intel UHD Graphics 630 (CFL GT2) v: 4.6 Mesa 20.0.4
Audio: Device-1: Intel Cannon Lake PCH cAVS driver: snd_hda_intel
Device-2: Creative Labs CA0108/CA10300 [Sound Blaster Audigy Series] driver: snd_emu10k1
Sound Server: ALSA v: k5.5.7-gentoo
Network: Device-1: Intel Ethernet I219-V driver: e1000e
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Intel Dual Band Wireless-AC 3168NGW [Stone Peak] driver: N/A
Device-3: Intel I211 Gigabit Network driver: vfio-pci
IF-ID-1: sit0 state: down mac: <filter>
IF-ID-2: tun0 state: unknown speed: 10 Mbps duplex: full mac: N/A
Drives: Local Storage: total: 1.21 TiB used: 255.61 GiB (20.6%)
ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 PRO 512GB size: 476.94 GiB
ID-2: /dev/sda vendor: Samsung model: HM320JI size: 298.09 GiB
ID-3: /dev/sdb vendor: Western Digital model: WD5000LPVX-08V0TT5 size: 465.76 GiB
Partition: ID-1: / size: 97.93 GiB used: 49.74 GiB (50.8%) fs: ext4 dev: /dev/nvme0n1p3
ID-2: /home size: 365.96 GiB used: 162.84 GiB (44.5%) fs: ext4 dev: /dev/nvme0n1p4
ID-3: swap-1 size: 4.00 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p2
Sensors: System Temperatures: cpu: 28.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info: Processes: 200 Uptime: 35m Shell: bash inxi: 3.0.36
Also, could you please answer whether it may be a HW related issue (a faulty CPU, for instance)? Or most likely it's related to software?
Thanks.
dmidecode.txt dmesg-5.5.7-gentoo sys_class_drm_card0_error-5.5.7-gentoo