Assertion `!xcb_xlib_threads_sequence_lost' failed when glXCreateContextAttribsARB fails
The attached program crashes with libX11:
got error 2 req 152 [xcb] Unknown sequence number while processing queue [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called [xcb] Aborting, sorry about that. test: ../../src/xcb_io.c:269: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed. Annullato
It is the same assertion that fails in #141 (closed), but I am not sure if the underlying problem is the same, so I am filing another issue.
I can reproduce the crash with the libX11 package shipped with Debian unstable on my laptop, with the libX11 packaged shipped with Arch on a Steam Deck (the libX11 is unmodified from upstream Arch) and with a libX11 build from git master on the Steam Deck. I discovered the bug while debugging a Steam game (The Last Campfire) running under Proton (basically a patched version of Wine) on the Steam Deck.
What's happening in the test program
The test program basically creates an X11 display and a window and then tries to create a GL context with
glXCreateContextAttribsARB. However, notice that the attributes for that call are invalid, so an error has to be raised (for some reason I don't know the game tries to use an invalid
GLX_CONTEXT_FLAGS_ARB, but I think that anything that makes
glXCreateContextAttribsARB return an error would cause the same problem). Just before calling
glXCreateContextAttribsARB, the program maps the window and it configures events so that an expose event is delivered during the little sleep.
So, at the time
glXCreateContextAttribsARB is called the X client has received an event, and it receives an error immediately after. The
glXCreateContextAttribsARB implementation calls
__glXSendErrorForXcb, which in turn calls
_XError, which calls
_XSetLastRequestRead, which sets
last_request_read for the connection to the sequence number of the error. The libX11 error callback defined by the test program is then called, and the error is ignored.
Later, within the
XSync call, pending events are finally processed. When the expose event is processed,
widen is called, using
last_request_read as the reference request number to reconstruct the upper dword. But
last_request_read has already taken the sequence number of the error, so it is greater than the event's request number, which means that it will incorrectly be increased by 2^32, triggering the failing assertion.
I am not sure of what is the wrong ring of the chain here: clearly libX11 assumes that
last_request_read is always smaller than the request number of the event it is processing, but calling
_XError from Mesa violates this assumption. So either a different algorithm is used for widening that doesn't assume that
last_request_read is smaller than the event's request number (e.g., it could add 2^32 only if the widened sequence number is at least 2^31 bigger than
last_request_read), or a different reference number is used for widening, or Mesa avoids calling
_XError (but I don't know what it should do instead).
The first one seems to be the easier solution to implement, but I don't know if calling
_XError from Mesa breaks other assumptions on