Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Godot 4.X (engine, editor as well as released apps) and other applications crash with libX11-1.8.7 (or later). Downgrading to libX11-1.8.4 seems to workaround the problem. Tested on Fedora 38, 39, also reported on Ubuntu 23.04.
Reported to impact Rust Desk as well as mentioned here:
[xcb] Unknown sequence number while processing queue[xcb] You called XInitThreads, this is not your fault[xcb] Aborting, sorry about that.godot.linuxbsd.editor.x86_64: xcb_io.c:278: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
This is quite a serious problem. Preventing all (new and old) Godot based apps from running on systems with libX11-1.8.7 or later. Because Fedora 39 and other similarly recent distros don't support earlier versions of libX11 any more, this prevents a lot of apps from running on many Linux systems.
Hello, Godot maintainer here. We still get regular bug reports about this, and this doesn't seem to be fixed in latest libX11 1.8.9.
Every version since 1.8.3 seems to have exposed new random threading related crashes.
Here are three Fedora automated crash reports for Godot, all crashing in libX11 code. I would really appreciate some input from upstream maintainers to understand whether this is something we need to work around in Godot (and how), or whether this will eventually be fixed upstream.
Hi, Godot user here on Arch Linux. I'm able to reproduce this fairly reliably (within a few minutes), but only with the official Arch Linux binary package (version 1.8.9). If I build that package myself, using the exact same PKGBUILD recipe, the problem disappears, so I have to rely on gdb and can't add any kind of logging.
The root cause seems to be a lack of locking somewhere.
There is no code that modifies the req pointer in the meantime. Then, if there is actually a pending request and some other conditions hold, the pending requests is dequeued:
dequeue_pending_request(dpy, req);
And the first thing that function does, is to fail the assertion:
if (req != dpy->xcb->pending_requests) throw_thread_fail_assert("Unknown request in queue while " "dequeuing", xcb_xlib_unknown_req_in_deq);
Since req is a local variable and hasn't been changed, this must mean that dpy->xcb->pending_requests has been changed in the meantime. The culprit must have been either some invalid memory access on the same thread, or a race condition from a different thread. My money is on the latter. (It could theoretically also have been some callback that performed a reentrant libx11 call, but I don't see any place where callbacks are invoked here; also, it would imply a lack of locking somewhere, same as a threading issue.)
Indeed, the debugger shows values in *req that are clearly bogus, suggesting that it's been freed and the memory has been overwritten:
It should be noted that we are in an XFlush() call, which is a critical section, calling LockDisplay() at the start and UnlockDisplay() at the end. So if this is a threading issue, we'd want to look for places that modify pending_requests without issuing such a lock.
There are only two such places that matter: append_pending_request and dequeue_pending_request. So I set a conditional breakpoint in both, with the condition dpy->lock->mutex->__data->__owner == 0 (relying on some pthreads internals to check if the mutex is locked). After a few minutes, the breakpoint is hit, yielding the following stack trace:
#0 dequeue_pending_request (dpy=dpy@entry=0x55555cd1fde0, req=req@entry=0x55556a1df6f0) at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:174#1 0x00007ffff7103343 in _XReply (dpy=0x55555cd1fde0, rep=0x7fffffffdb00, extra=0, discard=0) at /usr/src/debug/libx11/libX11-1.8.9/src/xcb_io.c:736#2 0x00007ffff70e40f4 in XGetWindowProperty (dpy=0x55555cd1fde0, w=25165826, property=372, offset=0, length=32, delete=<optimized out>, req_type=4, actual_type=0x7fffffffdbb8, actual_format=0x7fffffffdbb4, nitems=0x7fffffffdbc0, bytesafter=0x7fffffffdbc8, prop=0x7fffffffdbd0) at /usr/src/debug/libx11/libX11-1.8.9/src/GetProp.c:69#3 0x0000555555af1360 in DisplayServerX11::_window_minimize_check (this=this@entry=0x55555ccfc9f0, p_window=p_window@entry=0) at platform/linuxbsd/x11/display_server_x11.cpp:2375#4 0x0000555555af167f in DisplayServerX11::window_get_mode (this=0x55555ccfc9f0, p_window=0) at platform/linuxbsd/x11/display_server_x11.cpp:2705#5 0x0000555555aeba48 in DisplayServerX11::can_any_window_draw (this=0x55555ccfc9f0) at platform/linuxbsd/x11/display_server_x11.cpp:2912#6 0x0000555555b45426 in Main::iteration () at main/main.cpp:3685#7 0x0000555555ad7311 in OS_LinuxBSD::run (this=this@entry=0x7fffffffddb0) at platform/linuxbsd/os_linuxbsd.cpp:958#8 0x0000555555ac5176 in main (argc=<optimized out>, argv=0x7fffffffe398) at platform/linuxbsd/godot_linuxbsd.cpp:74
When continuing the program after the breakpoint is hit, it sometimes immediately aborts with the aforementioned message Unknown request in queue while dequeuing, but sometimes the breakpoint is triggered a second time before the abort actually happens.
The API function XGetWindowProperty called from Godot does lock the mutex, but _XReplytransiently unlocks it for a while. And apparently, by the time dequeue_pending_request is called here, the mutex is somehow not locked.
This is as far as I got. I tried setting more breakpoints and dprintfs in _XReply to find out where exactly the lock is lost, but these seem to interfere with my ability to trigger the crash.
I hope some hero from the libx11 team can figure out what's going on here!