App unable to achieve 60 FPS with DRM_OUTPUT_PROPOSE_STATE_PLANES_ONLY/MIXED
Hi,
I am working on adding this support: https://lists.nongnu.org/archive/html/qemu-devel/2021-06/msg06482.html If I force the DRM backend to use DRM_OUTPUT_PROPOSE_STATE_RENDERER_ONLY (by doing export WESTON_FORCE_RENDERER=1), then everything works as expected and I get 60 FPS. However, when the DRM backend flips my app's (Qemu UI) buffer onto a hardware plane, I only get 30 FPS.
I noticed that when the backend uses DRM_OUTPUT_PROPOSE_STATE_PLANES_ONLY/MIXED, there are 3 references taken on the client's buffer: one each in weston_surface_attach, gl_renderer_attach, drm_fb_set_buffer and the wl_buffer.release event gets triggered from drm_fb_destroy (which comes in roughly 6-7 ms after submitting the repaint from the backend) which is called from atomic_flip_handler. However, with RENDERER_ONLY, there are only 2 references taken: one each in weston_surface_attach, gl_renderer_attach and the release event gets sent after gl_renderer_attach. And, it appears the wl_buffer is submitted in frame callback for RENDERER_ONLY case whereas this is not the case for PLANES_ONLY/MIXED.
As far as my use-case is concerned, I am trying to share the Guest VM Compositor's (Weston with DRM backend) scanout FB with the Host compositor (Weston with DRM Backend) in a zero-copy way. All the rendering in the Guest VM is done via a passthrough GPU and Qemu UI gets access to the dmabuf associated with the Guest scanout FB. This scanout FB/dmabuf is wrapped in a wl_buffer via the linux-dmabuf protocol in the Qemu UI module and is sent to the Host compositor. The Guest will be blocked (it waits on a dma fence) until the Host compositor sends a wl_buffer.release event (which also signals the dma fence on which Guest is waiting) associated with the relevant wl_buffer/dmabuf. The Guest compositor's repaint cycle is directly tied to the release event (essentially acting as a vblank) and after a wl_buffer.release, it takes the Guest compositor roughly 10-12 ms to submit a new scanout FB.
Unlike https://cgit.freedesktop.org/wayland/weston/tree/clients/simple-dmabuf-egl.c, I can only work with 2 buffers as opposed to 3 given my use-case. And, when I modified simple-dmabuf-egl.c (ran it as ./weston-simple-dmabuf-egl -e 0 -i 1 -s 1080 -g to mimic my use-case) to use only 2 buffers and submit the previous wl_buffer in frame_callback/redraw if there are no free buffers available, I only get 30 FPS with PLANES_ONLY/MIXED.
Initially, I was under the impression that I was not submitting the wl_buffer in the right window (frame callback) but now I am wondering if it is even feasible to achieve 60 FPS with PLANES_ONLY/MIXED with only 2 buffers. Any comments about feasibility or other suggestions?