Clarifying linux-drm-syncobj-v1 behavior with multiple buffer types
This is a general question clarifying explicit sync behavior, the main goal is to better communicate expected behavior for applications and follow up on if any spec changes would be helpful.
Recently we've seen some reports from users containing:
wp_linux_drm_syncobj_surface_v1@49: error 2: Explicit Sync only supported on dmabuf buffers
...
The Wayland connection experienced a fatal error: Protocol error
These come from various Qt applications such as Kdenlive.
I think what's happening here is that the surface is getting rendered to by EGL and therefore has a wp_linux_drm_syncobj_surface_v1
object associated with it. It looks like at some point they directly attach a wl_shm
buffer to the surface without specifying any explicit sync points, but because the compositor sees that the surface has a wp_linux_drm_syncobj_surface_v1
object it throws an error because it expects only supported buffer types (aka dmabuf).
Based on the spec it does sound like a protocol error in this scenario is valid: if the surface has a wp_linux_drm_syncobj_surface_v1
object then you must include sync points in every commit that attaches a buffer, and that buffer must be a supported buffer type. This would indicate the user reports are application bugs.
Explicit synchronization is guaranteed to be supported for buffers created
with any version of the linux-dmabuf protocol. Compositors are free to support
explicit synchronization for additional buffer types. If at surface commit time
the attached buffer does not support explicit synchronization, an
unsupported_buffer error is raised.
As long as the wp_linux_drm_syncobj_surface_v1 object is alive, the compositor
may ignore implicit synchronization for buffers attached and committed to the
wl_surface. The delivery of wl_buffer.release events for buffers attached to the
surface becomes undefined.
Clients must set both acquire and release points if and only if a non-null buffer
is attached in the same surface commit. See the no_buffer, no_acquire_point and
no_release_point protocol errors.
I see a couple open questions regarding recommended application behavior right now:
- Is there any interest in supporting mixed "native" (native , i.e. app-initiated buffer attach) and EGL rendering (driver-initiated buffer attach) on an explicit sync surface?
- Do apps even want to do this?
- I'm not even sure that this is possible in the current version: How would the app know the latest sync point to signal? How would the driver know what sync point the app used when it did "native" rendering?
- Based on the bug reports it seems that Qt apps want to do this sometimes. I'm not sure if this is intentional or not?
- Vulkan does not allow "native" rendering on Vulkan surfaces, so this doesn't apply there.
- EGL kind-of-sort-of allows mixed rendering like this, synchronizing it with
eglWaitClient()
/eglWaitGL()
. I don't think this is widely used?
- Is attaching a unsupported buffer type to an explicit sync enabled surface something we want to support?
- In the current version the answer appears to be no.
- No way to check compositor suppport:
- Only dmabuf buffers are guaranteed to be supported right now, if Compositor A supports shm buffer types there is no way for the app to check this without triggering a protocol error, so apps effectively can only use dmabufs.
- Is this something we would like to try to add to
linux-drm-syncobj-v1
?
I don't think this was covered in the initial MR but maybe I missed it. I'm especially curious if anyone has a case for why apps would need to do this.