GPU accelerated applications freezes on dri3_wait_for_event_locked (xcb_wait_for_special_event)
System information
System: Host: neworld-vinted Kernel: 5.4.30-1-lts x86_64 bits: 64 compiler: gcc v: 9.3.0 Desktop: LXDE dm: LightDM
Distro: Arch Linux
CPU: Topology: 6-Core model: Intel Core i9-8950HK bits: 64 type: MT MCP arch: Kaby Lake rev: A L2 cache: 12.0 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 69597
Speed: 4317 MHz min/max: 800/4800 MHz Core speeds (MHz): 1: 4378 2: 4369 3: 4359 4: 4400 5: 4369 6: 4334 7: 4386
8: 4399 9: 4396 10: 4387 11: 4395 12: 4376
Graphics: Device-1: Intel UHD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:3e9b
Device-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] driver: N/A bus ID: 01:00.0 chip ID: 10de:1c8c
Display: x11 server: X.Org 1.20.8 driver: intel unloaded: nvidia resolution: 3840x2160~60Hz
OpenGL: renderer: Mesa Intel UHD Graphics 630 (CFL GT2) v: 4.6 Mesa 20.0.4 direct render: Yes
GLX info:
name of display: :0
display: :0 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: Intel (0x8086)
Device: Mesa Intel(R) UHD Graphics 630 (CFL GT2) (0x3e9b)
Version: 20.0.4
Accelerated: yes
Video memory: 3072MB
Unified memory: yes
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) UHD Graphics 630 (CFL GT2)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 20.0.4
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.6 (Compatibility Profile) Mesa 20.0.4
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 20.0.4
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
Describe the issue
Some GPU accelerated applications are freezing from time to time. The period between freezing is random. Sometimes it happens every 5 minutes, sometimes applications survive a few hours. Problematic applications are Alacritty, Spotify, Code. Google chrome somehow unfreeze itself after a few seconds. Android Virtual Device
never froze. However, I observer most freezes happened if the android emulator is on. Also, more freezes happened on a fresh boot than long uptime.
EDIT: My first attempt was somehow broken and it showed frozen due to glPrimitiveBoundingBox. But debugging with debug symbols shown quite different stacktrace. Froze happens here all the times:
static bool
dri3_wait_for_event_locked(struct loader_dri3_drawable *draw)
{
xcb_generic_event_t *ev;
xcb_present_generic_event_t *ge;
xcb_flush(draw->conn);
/* Only have one thread waiting for events at a time */
if (draw->has_event_waiter) {
cnd_wait(&draw->event_cnd, &draw->mtx);
/* Another thread has updated the protected info, so retest. */
return true;
} else {
draw->has_event_waiter = true;
/* Allow other threads access to the drawable while we're waiting. */
mtx_unlock(&draw->mtx);
ev = xcb_wait_for_special_event(draw->conn, draw->special_event); <---- THIS LINE FROZE
mtx_lock(&draw->mtx);
draw->has_event_waiter = false;
cnd_broadcast(&draw->event_cnd);
}
if (!ev)
return false;
ge = (void *) ev;
dri3_handle_present_event(draw, ge);
return true;
}
Old report without debug symbols which shown frozen `glPrimitiveBoundingBox`
Unfortunately, I can not manage to gdb alacrity with debugging symbols. But at least I see alacrity stuck on `glPrimitiveBoundingBox` calls:#0 0x00007f43f327cabf in poll () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007f43f33ad63b in ?? () from /usr/lib/libxcb.so.1
No symbol table info available.
#2 0x00007f43f33af45b in xcb_wait_for_special_event () from /usr/lib/libxcb.so.1
No symbol table info available.
#3 0x00007f43efe659f1 in glPrimitiveBoundingBox () from /usr/lib/libGLX_mesa.so.0
No symbol table info available.
#4 0x00007f43efe65b58 in glPrimitiveBoundingBox () from /usr/lib/libGLX_mesa.so.0
No symbol table info available.
#5 0x00007f43efe66d8e in glPrimitiveBoundingBox () from /usr/lib/libGLX_mesa.so.0
No symbol table info available.
#6 0x00007f43efe67cfc in glPrimitiveBoundingBox () from /usr/lib/libGLX_mesa.so.0
No symbol table info available.
#7 0x00007f43ee7dc2bb in ?? () from /usr/lib/dri/iris_dri.so
No symbol table info available.
#8 0x00007f43ee7de3f4 in ?? () from /usr/lib/dri/iris_dri.so
No symbol table info available.
#9 0x00007f43ee7ea56c in ?? () from /usr/lib/dri/iris_dri.so
No symbol table info available.
#10 0x00007f43ee7eabc9 in ?? () from /usr/lib/dri/iris_dri.so
No symbol table info available.
#11 0x00007f43ee80eb26 in ?? () from /usr/lib/dri/iris_dri.so
No symbol table info available.
#12 0x00007f43ee8136a4 in ?? () from /usr/lib/dri/iris_dri.so
People are reporting the problem only for Intel GPU. The range of affected Mesa is 18.3.0 - 20.0.4. It is huge. I would like to say it is a problem with the driver, but I found the same problem on both iris
and i965
.
Log files as attachment
In most cases, I found this line in dmesg during freeze:
[ 6600.003165] GpuWatchdog[1834]: segfault at 0 ip 0000559e7104dee2 sp 00007f12ee468600 error 6 in chrome[559e6d107000+7287000]
Xorg logs are empty during the freeze.
Any extra information would be greatly appreciated
I am blacklisted Nvidia and nouveau kernel modules. I am using qtile. Other people with the same problems are using similar systems as Xmonad
I suppose it is far too little information. Please tell me what else I could check or debug, or try to do.