App unable to achieve 60 FPS with DRM_OUTPUT_PROPOSE_STATE_PLANES_ONLY/MIXED

Hi,

I am working on adding this support: https://lists.nongnu.org/archive/html/qemu-devel/2021-06/msg06482.html If I force the DRM backend to use DRM_OUTPUT_PROPOSE_STATE_RENDERER_ONLY (by doing export WESTON_FORCE_RENDERER=1), then everything works as expected and I get 60 FPS. However, when the DRM backend flips my app's (Qemu UI) buffer onto a hardware plane, I only get 30 FPS.

I noticed that when the backend uses DRM_OUTPUT_PROPOSE_STATE_PLANES_ONLY/MIXED, there are 3 references taken on the client's buffer: one each in weston_surface_attach, gl_renderer_attach, drm_fb_set_buffer and the wl_buffer.release event gets triggered from drm_fb_destroy (which comes in roughly 6-7 ms after submitting the repaint from the backend) which is called from atomic_flip_handler. However, with RENDERER_ONLY, there are only 2 references taken: one each in weston_surface_attach, gl_renderer_attach and the release event gets sent after gl_renderer_attach. And, it appears the wl_buffer is submitted in frame callback for RENDERER_ONLY case whereas this is not the case for PLANES_ONLY/MIXED.

As far as my use-case is concerned, I am trying to share the Guest VM Compositor's (Weston with DRM backend) scanout FB with the Host compositor (Weston with DRM Backend) in a zero-copy way. All the rendering in the Guest VM is done via a passthrough GPU and Qemu UI gets access to the dmabuf associated with the Guest scanout FB. This scanout FB/dmabuf is wrapped in a wl_buffer via the linux-dmabuf protocol in the Qemu UI module and is sent to the Host compositor. The Guest will be blocked (it waits on a dma fence) until the Host compositor sends a wl_buffer.release event (which also signals the dma fence on which Guest is waiting) associated with the relevant wl_buffer/dmabuf. The Guest compositor's repaint cycle is directly tied to the release event (essentially acting as a vblank) and after a wl_buffer.release, it takes the Guest compositor roughly 10-12 ms to submit a new scanout FB.

Unlike https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf-egl.c, I can only work with 2 buffers as opposed to 3 given my use-case. And, when I modified simple-dmabuf-egl.c (ran it as ./weston-simple-dmabuf-egl -e 0 -i 1 -s 1080 -g to mimic my use-case) to use only 2 buffers and submit the previous wl_buffer in frame_callback/redraw if there are no free buffers available, I only get 30 FPS with PLANES_ONLY/MIXED.

Initially, I was under the impression that I was not submitting the wl_buffer in the right window (frame callback) but now I am wondering if it is even feasible to achieve 60 FPS with PLANES_ONLY/MIXED with only 2 buffers. Any comments about feasibility or other suggestions?

Edited Jul 01, 2021 by Vivek Kasireddy

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

App unable to achieve 60 FPS with DRM_OUTPUT_PROPOSE_STATE_PLANES_ONLY/MIXED