i915 bug: PC freeze or GPU HANG. Kernel: 5.8-rc7.d0731.g7dc6fd0-1
Further tickets are: #2188 (closed), #2189 (closed)
This ticket has 'rolling title': it updates prior to latest kernel version checked for the bug.
It is first msg in topic and starts from 5.8-rc5
kernel:
Key journalctl -r
lines
...
Jul 16 01:54:09 pc kernel: ---[ end trace 19c24ff6f1d82e3e ]---
Jul 16 01:54:09 pc kernel: secondary_startup_64+0xb6/0xc0
Jul 16 01:54:09 pc kernel: start_secondary+0x178/0x1c0
Jul 16 01:54:09 pc kernel: cpu_startup_entry+0x19/0x20
Jul 16 01:54:09 pc kernel: do_idle+0x1fb/0x2c0
Jul 16 01:54:09 pc kernel: cpuidle_enter+0x29/0x40
Jul 16 01:54:09 pc kernel: ? cpuidle_enter_state+0xa4/0x420
Jul 16 01:54:09 pc kernel: R13: ffff961081f36800 R14: 0000000000000008 R15: 000001ef535e301a
Jul 16 01:54:09 pc kernel: R10: 0000000000000f8d R11: 0000000000006080 R12: 0000000000000008
Jul 16 01:54:09 pc kernel: RBP: ffffffffb78c9bc0 R08: 000001ef535e301a R09: 0000000000000020
Jul 16 01:54:09 pc kernel: RDX: 0000000000000000 RSI: ffffffffb756a7f2 RDI: ffffffffb754a72f
Jul 16 01:54:09 pc kernel: RAX: ffff961081f00000 RBX: ffff961081f36800 RCX: 000000000000001f
Jul 16 01:54:09 pc kernel: RSP: 0018:ffff9b94c00efe78 EFLAGS: 00000246
Jul 16 01:54:09 pc kernel: Code: 80 76 62 49 e8 1b 3e 8e ff 49 89 c7 0f 1f 44 00 00 31 ff e8 4c 4c 8e ff 80 7c 24 0f 00 0f 85 06 02 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e9 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
Jul 16 01:54:09 pc kernel: RIP: 0010:cpuidle_enter_state+0xb6/0x420
Jul 16 01:54:09 pc kernel: asm_common_interrupt+0x1e/0x40
Jul 16 01:54:09 pc kernel: common_interrupt+0xd1/0x200
Jul 16 01:54:09 pc kernel: irq_exit_rcu+0xcb/0x120
Jul 16 01:54:09 pc kernel: do_softirq_own_stack+0x5f/0x80
Jul 16 01:54:09 pc kernel: </IRQ>
Jul 16 01:54:09 pc kernel: asm_call_on_stack+0x12/0x20
Jul 16 01:54:09 pc kernel: ? handle_fasteoi_irq+0x210/0x210
Jul 16 01:54:09 pc kernel: ? handle_irq_event+0x78/0xb0
Jul 16 01:54:09 pc kernel: ? handle_fasteoi_irq+0x210/0x210
Jul 16 01:54:09 pc kernel: R13: ffffffffb630c920 R14: 0000000000000001 R15: ffff9b94c0179000
Jul 16 01:54:09 pc kernel: R10: ffff9b94c0178f80 R11: 0000000000000068 R12: ffff96107ae5aa00
Jul 16 01:54:09 pc kernel: RBP: ffff9b94c00efd60 R08: 000001ef535e455b R09: ffffffffb78de240
Jul 16 01:54:09 pc kernel: RDX: 0000000000000000 RSI: ffffffffb75da897 RDI: ffffffffb7572786
Jul 16 01:54:09 pc kernel: RAX: 0000000000000002 RBX: ffff96107e2b3e00 RCX: 000000000000001f
Jul 16 01:54:09 pc kernel: RSP: 0018:ffff9b94c0178f90 EFLAGS: 00000292
Jul 16 01:54:09 pc kernel: Code: c7 44 24 28 0a 00 00 00 44 89 74 24 04 48 c7 c7 97 a8 5d b7 e8 8e 19 bf ff 65 66 c7 05 b4 ba 02 49 00 00 fb 66 0f 1f 44 00 00 <48> c7 44 24 08 c0 50 80 b7 b8 ff ff ff ff 0f bc 44 24 04 83 c0 01
Jul 16 01:54:09 pc kernel: RIP: 0010:__do_softirq+0x93/0x352
Jul 16 01:54:09 pc kernel: asm_sysvec_irq_work+0x12/0x20
Jul 16 01:54:09 pc kernel: sysvec_irq_work+0x41/0xe0
Jul 16 01:54:09 pc kernel: __sysvec_irq_work+0x2d/0xf0
Jul 16 01:54:09 pc kernel: irq_work_run+0x26/0x40
Jul 16 01:54:09 pc kernel: irq_work_run_list+0x2d/0x40
Jul 16 01:54:09 pc kernel: irq_work_single+0x2c/0x40
Jul 16 01:54:09 pc kernel: signal_irq_work+0x23e/0x350 [i915]
Jul 16 01:54:09 pc kernel: dma_i915_sw_fence_wake_timer+0x2c/0x50 [i915]
Jul 16 01:54:09 pc kernel: __i915_sw_fence_complete+0x156/0x1b0 [i915]
Jul 16 01:54:09 pc kernel: autoremove_wake_function+0xe/0x30
Jul 16 01:54:09 pc kernel: ? try_to_wake_up+0x7a/0x680
Jul 16 01:54:09 pc kernel: try_to_wake_up+0x1e7/0x680
Jul 16 01:54:09 pc kernel: <IRQ>
Jul 16 01:54:09 pc kernel: Call Trace:
Jul 16 01:54:09 pc kernel: CR2: 00007f841e3d9000 CR3: 00000004eba0a003 CR4: 00000000003606e0
Jul 16 01:54:09 pc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 01:54:09 pc kernel: FS: 0000000000000000(0000) GS:ffff961081f00000(0000) knlGS:0000000000000000
Jul 16 01:54:09 pc kernel: R13: 0000000000000002 R14: ffff96107dee07ac R15: 000000000002c340
Jul 16 01:54:09 pc kernel: R10: ffff961053e94300 R11: 0000000000002400 R12: 0000000000000002
Jul 16 01:54:09 pc kernel: RBP: 00000000fffffffb R08: 0000000000000000 R09: 0000000000000001
Jul 16 01:54:09 pc kernel: RDX: 000000000002c340 RSI: ffffffffb756a7f2 RDI: ffffffffb754a72f
Jul 16 01:54:09 pc kernel: RAX: 0000000000000002 RBX: ffff96107dee0000 RCX: ffff961081f00000
Jul 16 01:54:09 pc kernel: RSP: 0018:ffff9b94c0178cf8 EFLAGS: 00010046
Jul 16 01:54:09 pc kernel: Code: 02 83 e1 f7 83 e0 01 c1 e0 03 09 c8 88 83 c4 04 00 00 c7 42 68 01 00 00 00 e8 fa e5 07 00 b8 01 00 00 00 5b 5d 41 5c 41 5d c3 <0f> 0b 31 c0 eb 90 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 49 89
Jul 16 01:54:09 pc kernel: RIP: 0010:ttwu_queue_wakelist+0xc2/0xd0
Jul 16 01:54:09 pc kernel: Hardware name: Default string Default string/Default string, BIOS 5.12 11/10/2018
Jul 16 01:54:09 pc kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.8.0-1-MANJARO #1
Jul 16 01:54:09 pc kernel: int3400_thermal acpi_thermal_rel sg drm crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas crc32c_intel xhci_hcd
Jul 16 01:54:09 pc kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse hid_logitech_hidpp joydev input_leds mousedev hid_logitech_dj hid_generic usbhid intel_xhci_usb_role_switch roles rfkill snd_usb_audio snd_usbmidi_lib snd_hwdep snd_rawmidi snd_seq_device mc x86_pkg_temp_thermal intel_powerclamp coretemp snd_pcm kvm_intel squashfs snd_timer kvm i915 snd irqbypass ee1004 soundcore crct10dif_pclmul loop crc32_pclmul iTCO_wdt intel_pmc_bxt iTCO_vendor_support intel_rapl_msr intel_wmi_thunderbolt ghash_clmulni_intel aesni_intel nls_iso8859_1 crypto_simd nls_cp437 i2c_algo_bit vfat fat drm_kms_helper cryptd glue_helper rapl cec intel_cstate intel_uncore rc_core r8169 pcspkr intel_gtt processor_thermal_device i2c_i801 syscopyarea sysfillrect realtek sysimgblt i2c_smbus intel_rapl_common libphy fb_sys_fops intel_pch_thermal intel_soc_dts_iosf wmi int3403_thermal int340x_thermal_zone bmc150_accel_i2c i2c_hid bmc150_accel_core hid industrialio_triggered_buffer kfifo_buf evdev industrialio mac_hid
Jul 16 01:54:09 pc kernel: WARNING: CPU: 2 PID: 0 at kernel/sched/core.c:2388 ttwu_queue_wakelist+0xc2/0xd0
Jul 16 01:54:09 pc kernel: ------------[ cut here ]------------
Jul 16 01:54:09 pc kernel: GPU crash dump saved to /sys/class/drm/card0/error
Jul 16 01:54:09 pc kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Jul 16 01:54:09 pc kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jul 16 01:54:09 pc kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Jul 16 01:54:09 pc kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Jul 16 01:54:09 pc kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jul 16 01:54:09 pc kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:85dffffb, in opera [4320]
Jul 16 01:54:09 pc kernel: i915 0000:00:02.0: [drm] opera[4320] context reset due to GPU hang
Jul 16 01:54:09 pc kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
...
Platform: Manjaro Linux 20.0.3 64 bit KDE, unstable branch (it has latest updates in Manjaro project).
Kernel: Listed in Manjaro Setting Manager (MSM): 5.8rc5.d0712.g11ba468-1. The same kernel listed in uname -r
: 5.8.0-1-MANJARO.
Hardware briefly**: 1-year old mini-pc with the only GPU which is CPU on-die (on the same substrate as CPU is: Intel UHD Graphics 620). No any other GPU present in PC (ever).
Kernels affected looks like all current of the Manjaro Linux OS. The y are (at least):
4.19 LTS family and latest 4.19.132-1 LTS also,
5.4 LTS family (probably latest 5.4.51-1 also),
5.6 family (probably latest 5.6.19-2 also),
5.6-real-time family (probably latest 5.6.17_rt9-2 also),
5.7 family and latest 5.7.8-8 also,
5.8rc family and latest 5.8rc5 also.
Kernel. Usual behaviour:
4.19: restore state of graphics by resetting graphics (see 'More info by me also located on' link below),
5.7: hangs up but only picture (blinking picture with tens and hundreds colour rectangles in the black background), sound is playing back still, second terminal (Ctrl+Alt+F2) is available and working.
5.8rc (attention: not 'rt'): freezes completely OS: picture like still picture/photo, sound stops, no any terminals available: pc no responds. It is usually. Here that current ticket is then 5.8rc remaining 'alive' like 5.7 which is unusual and info gathered from second terminal session.
Additionally about kernels
Usually 5.8rc5 kernel freezes completely but in case of watching youtube video in opera web browser, paused, opened new tab with 'cs-online.club/en/servers' URL and press 'connect' btn on entry you want: connection stage starts and before game starts (before team selector) picture freezes and blinking with the center of the picture remainig as was (connection to server stage) and about 200px on each of 4 sides outside center rectagle picture is blinking with back color and (connection pictire + web browser tabs + taskbar) image. May be it is not always reproductable.
Very hard to catch 5.8rc5 kernel freeze cause usually OS freezes completely with static (not blinking) picture and second terminal session (Ctrl+Alt+F2) do not respond. The only way to reboot is to hold power btn on the PC case to power down PC via hardware method and then to click this power btn to start it again.
( More info by me also located on: https://forum.manjaro.org/t/random-freezing-with-resetting-rcs0-for-hang-on-rcs0/119313/34 )
Moreover: Manjaro Linux LiveCD v2003 (USB flash stick, was downloaded about 2 weeks ago) hangs on also during work as LiveCD. It has 5.6.15-1-MANJARO kernel built-in.
Features involved
looks like software-independent bug: software mentioned in "General steps to reproduce an issues" item below.
General steps to reproduce an issues
Based on randomly time and actions manner. Can't select the exact user behaviour, spontaneous error: it could be on logon screen before user input or after 12 hours of work. Usually after 30-120 minutes of very typical user actions in text editor, media player, web browser, desktop icons selection, selecting item in context menu on system tray icon, etc.
How often does the steps listed above trigger the issue
Issue occurred about 50-60 times for about last 2.5 weeks period. OS installation was 3-4 month ago and before that 2.5 weeks ago works well. May be software update was the cause of the problem, can't trace this now. Never saw such hard to catch bug and recently it became very very frequently several times per day at very least.
Display connector: (such as HDMI, DP, eDP, ...)
Monitor port: DVI-D.
PC port: HDMI (probably 1.4a).
Single cable (factory-made: without any adapters): one side has HDMI connector (PC) - other side has DVI-D connector (monitor).
Screenshot or photo (a picture is worth a thousand words)
see some of them by 'More info by me also located on' link to the manjaro forum (located above).
Attached files
sudo cp /sys/class/drm/card0/error Desktop/error
inxi -Fxxxz > inxi.txt
sudo dmidecode > dmidecode.txt
sudo dmesg > dmesg.txt
(than was shutdown now -r
and later:)
xrandr --verbose > xrandr.txt
journalctl -b-5 -r > journalctl.txt
mhwd -li > mhwd.txt
mhwd -la >> mhwd.txt
KDE. System settings -> System information
Operating System: Manjaro Linux
KDE Plasma Version: 5.19.3
KDE Frameworks Version: 5.72.0
Qt Version: 5.15.0
Kernel Version: 5.8.0-1-MANJARO
OS Type: 64-bit
Processors: 4 × Intel Core i5-8250U CPU @ 1.60GHz
Memory: 31.1 GiB of RAM
Graphics Processor: Mesa Intel UHD Graphics 620