v3dv WSI allocations

This is a continuation of the discussion started in !7303 (merged).

context: v3dv allocates WSI buffers on the primary display device (vc4) as dumb buffers, then imports them into the v3d render node (by exporting to an FD via drmPrimeHandleToFD on the display device and importing into the v3d render node via drmPrimeFDToHandle+DRM_IOCTL_V3D_GET_BO_OFFSET). When Mesa's Vulkan WSI implementation asks for an FD to the driver to share with the display server (via vkGetMemoryFdKHR) we export the buffer from the v3d render node to an FD using drmPrimeHandleToFD.

Issue: @daniels claims that we should be doing this the other way around, that is, allocate on the v3d render node directly. The idea is that the display server should be able to import the buffer through the fd provided via vkGetMemoryFdKHR. Doing this doesn't work though and presentation from such buffers fails with an error like this one:

X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  149 ()
  Minor opcode of failed request:  4
  Resource id in failed request:  0x1200006
  Serial number of failed request:  54
  Current serial number in output stream:  64

The way v3dv operates is identical to the way the v3d OpenGL driver works as far as I can tell. Here is what happens in a debug session of glxgears on Rpi4:

dri3_alloc_render_buffer ends up calling the resource create callback in the driver, where we detect that we are in a renderonly mode. There, Eric had this comment which I think is relevant to this discussion:

        /* If we're in a renderonly setup, use the other device to perform our
         * allocation and just import it to v3d.  The other device may be
         * using CMA, and V3D can import from CMA but doesn't do CMA
         * allocations on its own.
         *
         * We always allocate this way for SHARED, because get_handle will
         * need a resource on the display fd.
         */

Then it calls renderonly_scanout_for_resource which will allocate a dumb buffer on the display node and export that to an FD, which is then imported into the render node. Later the DRI code asks for an FD for the buffer, which calls into v3d_bo_get_dmabuf, which will export the buffer we have just imported into the render node to an FD and return that. This is exactly the same process used in v3dv.

I guess the relevant point here is that comment about the render node not being able to do CMA allocations, which I understand suggests that allocating on the display node and importing on the render node is a requirement for our platform.

One caveat that Daniel mentioned is that this setup will not work with a nested wayland compositor because such compositor won't have access to a primary node. However, if that is the case I understand the problem exists for The GL stack already as well. I am not aware of this issue being reported.

@anholt: could you confirm that this is indeed how things are expected to work for v3d?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information