[i915] GPU HANG: ecode 7:1:86ffffd, in obs
Summary:
For several months now, I have been experiencing problems with my i915 graphics device regarding hangs and chip resets in xorg, KDEnlive, plasmashell, OBS, etc. This issue addresses a hang in OBS that happened recently in the middle of a critical video recording session. Again, I have been experiencing these for months now but have only been able to sufficiently document it now, and have time to make an issue about it.
How to reproduce:
Open OBS Studio, record from a V4L2 capture device, wait an unspecified amount of time. Bog-standard settings; 720p, 30fps, 3.5kb/s, ultrafast, output to mp4 with H264.
Frequency:
Very spread out and unpredictable, but usually happens after one hour of recording. From previous testing, it happened after 82 minutes of recording.
Which forms and features are affected?
I have only observed this happening with kernels later than 4.19 (the latest provided by Debian Buster). I have another machine identical to this (I will call Machine B) with the exact same hardware except for different SSD capacity and different RAM brand and frequency.
What is even more frustrating and weird is, Machine B does not exhibit the same problems despite having the same version of Debian, the same kernel, and same BIOS settings. This happens even after swapping the RAM sticks. I guess I should be thankful, though.
I will perform further testing on both machines when I have the time, since I would like to be very sure whether this is a kernel problem or an OEM/product defect problem.
System information/specs:
- Architecture: x86_64
- Kernel Version: 5.10.0-8-amd64
- Linux Distribution: Debian GNU/Linux 11 (bullseye)
- Machine Model: Dell Inspiron 20 3048
- Motherboard Model: Dell 0HD5K4 A00 (full
dmidecode
is attached) - Processor: Intel Pentium G3240T
- Memory: 2x 2GB 1333 MT/s Hynix
- GPU: Intel HD Graphics (lspci says: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06))
- Display Connector: eDP (built-in display)
- dmesg output: (attached)
- card0/error: (attached)
- Photo: (photo of plasmashell, taken with my phone since screenshotting is impossible on a glitched plasmashell, blurred to hide some personal things)
Notes and remarks:
I personally think this is a regression. After a lot of testing, and like I mentioned earlier, this only occurs on kernels later than 4.19. A few months ago, I actually compiled 5.x kernels with custom settings such as turning mitigations off, and manually applying the DRM patch detailed in https://patchwork.kernel.org/project/intel-gfx/patch/20210111225220.3483-1-chris@chris-wilson.co.uk/ to no avail. This problem also occurred on Liquorix kernels, although I will have to test their latest kernels again.
I have also manually installed the nonfree firmware files in past tests (like tgl_guc, etc) but that has not changed anything. I will try that again for this newer kernel.
Another very frequent ecode is 0x85dffffc, which has been highlighted in another issue. I don't have enough time or data to write another issue about that at the moment, though.
This problem has been haunting me since February of this year, and has been hindering my brother's first experiences with Linux. Below are issues like this or that I have read in my attempts to solve this problem:
- https://bugzilla.redhat.com/show_bug.cgi?id=1937436
- #2024 (closed)
-
#3123 (closed)
- I commented on this with details on the ecode 0x85dffffc.
-
https://olegon.ru/showthread.php?t=35847
- A Russian friend helped me understand this comment, but the solution did not work.
- #2024 (closed)
And finally, thank you, kernel devs, for your time and great efforts in fixing bugs and improving the Linux kernel and its DRM component.dmidecode.txtdmesg.txtcard0-error.txt