Draft: Atomic request sequences
Although sending individual requests is thread-safe, if you've got multiple threads sending requests, then those requests can end up getting interleaved. In most cases, that's fine, but with wl_surface requests, it can change the semantics of the requests: If two threads are both trying to send a set of wl_surface requests, even if they're otherwise orthogonal, a wl_surface::commit request on one thread will split up the commit batch of the other.
That problem is most apparent with the linux-drm-syncobj protocol, where if you split the set_acquire_point or set_release_point request from the wl_surface::attach, then you get a protocol error and your client crashes. This has been observed in several applications already, generally something like this:
Thread A is rendering using explicit sync (probably via EGL, Vulkan, or similar):
- wl_surface::attach
- wp_linux_drm_syncobj_surface_v1::set_acquire_point
- wp_linux_drm_syncobj_surface_v1::set_release_point
- wl_surface::commit
Then, thread B, in the application or some other library, does something like this:
- wl_surface::frame
- wl_surface::commit
The protocol error happens if (6) gets sent between (1) and (3), because the server sees the attach request without the sync points.
Even without explicit sync, you could still have problems, just quieter ones. If one thread's wl_surface::commit splits another thread's wl_surface::damage and wl_surface::attach, then the damage rectangle would get applied to the wrong buffer.
Currently, the only way to avoid that is to never send protocol from more than one thread. Within an application or a library, you could use a mutex to avoid interleaved requests, but you can't coordinate between libraries like that.
In this MR, I've written up two possible solutions to that problem.
The first, and by far the simplest, is to expose the wl_display's mutex to applications. An application or a library can then call new wl_display_lock
and wl_display_unlock
functions to lock the display while it sends a set of requests.
That has the advantage of being very simple, and requiring minimal changes in applications or libraries to use it. The only real pitfall (other than forgetting to unlock the display) is that you have to be careful not to dispatch events while the display is locked, because then it might unlock the display using a condition variable. But, as long as you only ever call functions to send requests, then you're fine.
The second solution I've got is to add a new wl_request_queue
struct, which lets you queue up multiple requests and then send them all atomically. You can assign a wl_request_queue to a proxy wrapper, just like you would a wl_event_queue, and then any requests sent with that wrapper get added to the queue instead of being sent immediately. Then, once you've got all of the requests queued up, you can flush the queue to send them as a single atomic batch.
The main advantage of a request queue is that the caller doesn't need to sit on the mutex while it constructs a whole set of requests, plus the fact that you can't forget to unlock the display. And, the rough symmetry with wl_event_queue might be a bit more idiomatic.
The big disadvantage, of course, is that a request queue is much more complicated to use and implement, and it's a much more invasive change in an application to use it. It would also be much more difficult to #ifdef around it with Meson/autotools/etc.
In addition, although the locking is nicely contained, a request queue has a different potential pitfall, since the order that you call the request functions is different than the order the requests actually get sent. With a queue, it would be possible to queue up a request from some Wayland object, then directly send a request to destroy that object, and then flush the queue. The server would then get a request for an already-destroyed object.