GPU hang when running blender
In the past week or so, I have had three times that the GPU has hung while running blender. I hadn't seen this behavior prior to that, so I'm guessing it may be related to a system upgrade of some sort. It did start after I had run some updates to my system, one of which was an upgrade of blender from 2.82 to 2.83, so after the second crash, I downgraded blender back to 2.82, but that did not resolve the problem. For two of the crashes, I was able to get to a virtual terminal to reboot the computer cleanly, but for one of the crashes, the entire system was locked up to the point that I could not do anything except hold the power button until the computer rebooted.
Looking through my logs, the first time that I encountered this error was June 13 at 22:50. If I look at my pacman logs, I can see that I updated some software on June 12 at 21:30 and June 13 at 21:50. Some of the updates that look like they might be related:
[2020-06-12T10:32:03-0600] [ALPM] upgraded linux-firmware (20200519.r1641.8ba6fa6-1 -> 20200519.8ba6fa6-1)
[2020-06-12T10:32:04-0600] [ALPM] upgraded xorg-server (1.20.8-2 -> 1.20.8-3)
[2020-06-12T10:32:04-0600] [ALPM] upgraded xorg-server-common (1.20.8-2 -> 1.20.8-3)
[2020-06-13T21:50:59-0600] [ALPM] upgraded linux56 (5.6.15-1 -> 5.6.16-1)
When I run journalctl, I see the following lines:
Jun 19 22:07:06 athena kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[786]:72c0a timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Jun 19 22:07:06 athena kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[786]:72c0a timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
Jun 19 22:07:08 athena kernel: i915 0000:00:02.0: Resetting rcs0 for preemption time out
Jun 19 22:07:08 athena kernel: i915 0000:00:02.0: blender[63621] context reset due to GPU hang
Jun 19 22:07:08 athena kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:87f99eb9, in blender [63621]
Jun 19 22:07:08 athena kernel: GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jun 19 22:07:08 athena kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/intel/issues/new.
Jun 19 22:07:08 athena kernel: Please see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details.
Jun 19 22:07:08 athena kernel: drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jun 19 22:07:08 athena kernel: The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Jun 19 22:07:08 athena kernel: GPU crash dump saved to /sys/class/drm/card0/error
I don't have any specific steps to recreate the crash, unfortunately. So far, the only things that have been consistent is that each time it has happened, I was using Blender, and had firefox running to the side, usually with a youtube video running. Today, when the crash occurred and I moved to a virtual terminal, I was able to check my CPU and memory usage. CPU usage was around 5% and about half of my memory was completely free, so I don't believe that either of those was the source of the problem. When the crashes have occurred, I have been doing different things inside blender each time, so it doesn't seem to be tied to any specific operation within blender.
I have attached a bzip2 of the error file from /sys/class/drm/card0/error. It seems somewhat small, but I grabbed it the way described in the help wiki page, and did grab it before I rebooted after the crash today, so hopefully it contains the information that will help.
$ uname -m
x86_64
$ uname -r
5.6.16-1-MANJARO
I haven't switch to drm-tip, mostly because I'm not quite sure how to do that on this system. It looks like there may be something related to that in the AUR, so I can try to do that if you think it is necessary. Just don't want to possibly mess things up!
Please let me know if there's any important information that I missed! And, thank you for your help!