Commit 8a5cd28a authored by Simon Ser's avatar Simon Ser
Browse files

unstable/linux-dmabuf: add wp_linux_dmabuf_feedback



On multi-GPU setups, multiple devices can be used for rendering. Clients
need feedback about the device in use by the compositor. For instance,
if they render on another GPU, then they need to make sure the memory is
accessible between devices and that their buffers are not placed in
hidden memory.

This commit introduces a new wp_linux_dmabuf_feedback object. This
object advertises a preferred main device, a set of preferred
formats/modifiers and target devices.

Each object is bound to a wl_surface and can dynamically update its
feedback parameters. This enables fine-grained per-surface
optimizations. For instance, when a surface is scanned out on a GPU the
compositor isn't compositing with, the target device can be set to this
GPU to avoid unnecessary roundtrips.

A feedback object can also be standalone for clients that don't support
per-surface feedback.
Signed-off-by: Simon Ser's avatarSimon Ser <contact@emersion.fr>
Signed-off-by: Leandro Ribeiro's avatarLeandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Daniel Stone's avatarDaniel Stone <daniels@collabora.com>
Closes: wayland#59
parent e5d63e9a
Pipeline #451895 passed with stages
in 56 seconds
.. Copyright 2021 Simon Ser
.. contents::
linux-dmabuf feedback introduction
==================================
linux-dmabuf feedback allows compositors and clients to negotiate optimal buffer
allocation parameters. This document will assume that the compositor is using a
rendering API such as OpenGL or Vulkan and KMS as the presentation API: even if
linux-dmabuf feedback isn't restricted to this use-case, it's the most common.
linux-dmabuf feedback introduces the following concepts:
1. A main device. This is the render device that the compositor is using to
perform composition. Compositors should always be able to display a buffer
submitted by a client, so this device can be used as a fallback in case none
of the more optimized code-paths work. Clients should allocate buffers such
that they can be imported and textured from the main device.
2. One or more tranches. Each tranche consists of a target device, allocation
flags and a set of format/modifier pairs. A tranche can be seen as a set of
formats/modifier pairs that are compatible with the target device.
A tranche can have the ``scanout`` flag. It means that the target device is
a KMS device, and that buffers allocated with one of the format/modifier
pairs in the tranche are eligible for direct scanout.
Clients should use the tranches in order to allocate buffers with the most
appropriate format/modifier and also to avoid allocating in private device
memory when cross-device operations are going to happen.
linux-dmabuf feedback implementation notes
==========================================
This section contains recommendations for client and compositor implementations.
For clients
-----------
Clients are expected to either pick a fixed DRM format beforehand, or
perform the following steps repeatedly until they find a suitable format.
Basic clients may only support static buffer allocation on startup. These
clients should do the following:
1. Send a ``get_default_feedback`` request to get global feedback.
2. Select the device indicated by ``main_device`` for allocation.
3. For each tranche:
1. If ``tranche_target_device`` doesn't match the allocation device, ignore
the tranche.
2. Accumulate allocation flags from ``tranche_flags``.
3. Accumulate format/modifier pairs received via ``tranche_formats`` in a
list.
4. When the ``tranche_done`` event is received, try to allocate the buffer
with the accumulated list of modifiers and allocation flags. If that
fails, proceed with the next tranche. If that succeeds, stop the loop.
4. Destroy the feedback object.
Tranches are ordered by preference: the more optimized tranches come first. As
such, clients should use the first tranche that happens to work.
Some clients may have already selected the device they want to use beforehand.
These clients can ignore the ``main_device`` event, and ignore tranches whose
``tranche_target_device`` doesn't match the selected device. Such clients need
to be prepared for the ``wp_linux_buffer_params.create`` request to potentially
fail.
If the client allocates a buffer without specifying explicit modifiers on a
device different from the one indicated by ``main_device``, then the client
must force a linear layout.
Some clients might support re-negotiating the buffer format/modifier on the
fly. These clients should send a ``get_surface_feedback`` request and keep the
feedback object alive after the initial allocation. Each time a new set of
feedback parameters is received (ended by the ``done`` event), they should
perform the same steps as basic clients described above. They should detect
when the optimal allocation parameters didn't change (same
format/modifier/flags) to avoid needlessly re-allocating their buffers.
Some clients might additionally support switching the device used for
allocations on the fly. Such clients should send a ``get_surface_feedback``
request. For each tranche, select the device indicated by
``tranche_target_device`` for allocation. Accumulate allocation flags (received
via ``tranche_flags``) and format/modifier pairs (received via
``tranche_formats``) as usual. When the ``tranche_done`` event is received, try
to allocate the buffer with the accumulated list of modifiers and the
allocation flags. Try to import the resulting buffer by sending a
``wp_linux_buffer_params.create`` request (this might fail). Repeat with each
tranche until an allocation and import succeeds. Each time a new set of
feedback parameters is received, they should perform these steps again. They
should detect when the optimal allocation parameters didn't change (same
device/format/modifier/flags) to avoid needlessly re-allocating their buffers.
For compositors
---------------
Basic compositors may only support texturing the DMA-BUFs via a rendering API
such as OpenGL or Vulkan. Such compositors can send a single tranche as a reply
to both ``get_default_feedback`` and ``get_surface_feedback``. Set the
``main_device`` to the rendering device. Send the tranche with
``tranche_target_device`` set to the rendering device and all of the DRM
format/modifier pairs supported by the rendering API. Do not set the
``scanout`` flag in the ``tranche_flags`` event.
Some compositors may support direct scan-out for full-screen surfaces. These
compositors can re-send the feedback parameters when a surface becomes
full-screen or leaves full-screen mode if the client has used the
``get_surface_feedback`` request. The non-full-screen feedback parameters are
the same as basic compositors described above. The full-screen feedback
parameters have two tranches: one with the format/modifier pairs supported by
the KMS plane, with the ``scanout`` flag set in the ``tranche_flags`` event and
with ``tranche_target_device`` set to the KMS scan-out device; the other with
the rest of the format/modifier pairs (supported for texturing, but not for
scan-out), without the ``scanout`` flag set in the ``tranche_flags`` event, and
with the ``tranche_target_device`` set to the rendering device.
Some compositors may support direct scan-out for all surfaces. These
compositors can send two tranches for surfaces that become candidates for
direct scan-out, similarly to compositors supporting direct scan-out for
fullscreen surfaces. When a surface stops being a candidate for direct
scan-out, compositors should re-send the feedback parameters optimized for
texturing only. The way candidates for direct scan-out are selected is
compositor policy, a possible implementation is to select as many surfaces as
there are available hardware planes, starting from surfaces closer to the eye.
Some compositors may support multiple devices at the same time. If the
compositor supports rendering with a fixed device and direct scan-out on a
secondary device, it may send a separate tranche for surfaces displayed on
the secondary device that are candidates for direct scan-out. The
``tranche_target_device`` for this tranche will be the secondary device and
will not match the ``main_device``.
Some compositors may support switching their rendering device at runtime or
changing their rendering device depending on the surface. When the rendering
device changes for a surface, such compositors may re-send the feedback
parameters with a different ``main_device``. However there is a risk that
clients don't support switching their device at runtime and continue using the
previous device. For this reason, compositors should always have a fallback
rendering device that they initially send as ``main_device``, such that these
clients use said fallback device.
Compositors should not change the ``main_device`` on-the-fly when explicit
modifiers are not supported, because there's a risk of importing buffers
with an implicit non-linear modifier as a linear buffer, resulting in
misinterpreted buffer contents.
Compositors should not send feedback parameters if they don't have a fallback
path. For instance, compositors shouldn't send a format/modifier supported for
direct scan-out but not supported by the rendering API for texturing.
Compositors can decide to use multiple tranches to describe the allocation
parameters optimized for texturing. For example, if there are formats which
have a fast texturing path and formats which have a slower texturing path, the
compositor can decide to expose two separate tranches.
Compositors can decide to use intermediate tranches to describe code-paths
slower than direct scan-out but faster than texturing. For instance, a
compositor could insert an intermediate tranche if it's possible to use a
mem2mem device to convert buffers to be able to use scan-out.
``dev_t`` encoding
==================
The protocol carries ``dev_t`` values on the wire using arrays. A compositor
written in C can encode the values as follows:
.. code-block:: c
struct stat drm_node_stat;
struct wl_array dev_array = {
.size = sizeof(drm_node_stat.st_rdev),
.data = &drm_node_stat.st_rdev,
};
A client can decode the values as follows:
.. code-block:: c
struct dev_t dev;
assert(dev_array->size == sizeof(dev));
memcpy(&dev, dev_array->data, sizeof(dev));
Because two DRM nodes can refer to the same DRM device while having different
``dev_t`` values, clients should use ``drmDevicesEqual`` to compare two
devices.
``format_table`` encoding
=========================
The ``format_table`` event carries a file descriptor containing a list of
format + modifier pairs. The list is an array of pairs which can be accessed
with this C structure definition:
.. code-block:: c
struct dmabuf_format_modifier {
uint32_t format;
uint32_t pad; /* unused */
uint64_t modifier;
};
Integration with other APIs
===========================
- libdrm: ``drmGetDeviceFromDevId`` returns a ``drmDevice`` from a device ID.
- EGL: the `EGL_EXT_device_drm_render_node`_ extension may be used to query the
DRM device render node used by a given EGL display. When unavailable, the
older `EGL_EXT_device_drm`_ extension may be used as a fallback.
- Vulkan: the `VK_EXT_physical_device_drm`_ extension may be used to query the
DRM device used by a given ``VkPhysicalDevice``.
.. _EGL_EXT_device_drm: https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_device_drm.txt
.. _EGL_EXT_device_drm_render_node: https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_device_drm_render_node.txt
.. _VK_EXT_physical_device_drm: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_physical_device_drm.html
......@@ -24,17 +24,18 @@
DEALINGS IN THE SOFTWARE.
</copyright>
<interface name="zwp_linux_dmabuf_v1" version="3">
<interface name="zwp_linux_dmabuf_v1" version="4">
<description summary="factory for creating dmabuf-based wl_buffers">
Following the interfaces from:
https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_image_dma_buf_import_modifiers.txt
and the Linux DRM sub-system's AddFb2 ioctl.
This interface offers ways to create generic dmabuf-based
wl_buffers. Immediately after a client binds to this interface,
the set of supported formats and format modifiers is sent with
'format' and 'modifier' events.
This interface offers ways to create generic dmabuf-based wl_buffers.
Clients can use the get_surface_feedback request to get dmabuf feedback
for a particular surface. If the client wants to retrieve feedback not
tied to a surface, they can use the get_default_feedback request.
The following are required from clients:
......@@ -123,10 +124,9 @@
For the definition of the format codes, see the
zwp_linux_buffer_params_v1::create request.
Warning: the 'format' event is likely to be deprecated and replaced
with the 'modifier' event introduced in zwp_linux_dmabuf_v1
version 3, described below. Please refrain from using the information
received from this event.
Starting version 4, the format event is deprecated and must not be
sent by compositors. Instead, use get_default_feedback or
get_surface_feedback.
</description>
<arg name="format" type="uint" summary="DRM_FORMAT code"/>
</event>
......@@ -152,6 +152,10 @@
For the definition of the format and modifier codes, see the
zwp_linux_buffer_params_v1::create and zwp_linux_buffer_params_v1::add
requests.
Starting version 4, the modifier event is deprecated and must not be
sent by compositors. Instead, use get_default_feedback or
get_surface_feedback.
</description>
<arg name="format" type="uint" summary="DRM_FORMAT code"/>
<arg name="modifier_hi" type="uint"
......@@ -159,9 +163,34 @@
<arg name="modifier_lo" type="uint"
summary="low 32 bits of layout modifier"/>
</event>
<!-- Version 4 additions -->
<request name="get_default_feedback" since="4">
<description summary="get default feedback">
This request creates a new wp_linux_dmabuf_feedback object not bound
to a particular surface. This object will deliver feedback about dmabuf
parameters to use if the client doesn't support per-surface feedback
(see get_surface_feedback).
</description>
<arg name="id" type="new_id" interface="zwp_linux_dmabuf_feedback_v1"/>
</request>
<request name="get_surface_feedback" since="4">
<description summary="get feedback for a surface">
This request creates a new wp_linux_dmabuf_feedback object for the
specified wl_surface. This object will deliver feedback about dmabuf
parameters to use for buffers attached to this surface.
If the surface is destroyed before the wp_linux_dmabuf_feedback object,
the feedback object becomes inert.
</description>
<arg name="id" type="new_id" interface="zwp_linux_dmabuf_feedback_v1"/>
<arg name="surface" type="object" interface="wl_surface"/>
</request>
</interface>
<interface name="zwp_linux_buffer_params_v1" version="3">
<interface name="zwp_linux_buffer_params_v1" version="4">
<description summary="parameters for creating a dmabuf-based wl_buffer">
This temporary object is a collection of dmabufs and other
parameters that together form a single logical buffer. The temporary
......@@ -219,8 +248,8 @@
defined by the DRM fourcc code.
Warning: It should be an error if the format/modifier pair was not
advertised with the modifier event. This is not enforced yet because
some implementations always accept DRM_FORMAT_MOD_INVALID. Also
advertised by zwp_linux_dmabuf_feedback_v1. This is not enforced yet
because some implementations always accept DRM_FORMAT_MOD_INVALID. Also
version 2 of this protocol does not have the modifier event.
This request raises the PLANE_IDX error if plane_idx is too large.
......@@ -368,7 +397,192 @@
<arg name="format" type="uint" summary="DRM_FORMAT code"/>
<arg name="flags" type="uint" enum="flags" summary="see enum flags"/>
</request>
</interface>
<interface name="zwp_linux_dmabuf_feedback_v1" version="4">
<description summary="dmabuf feedback">
This object advertises dmabuf parameters feedback. This includes the
preferred devices and the supported formats/modifiers.
The parameters are sent once when this object is created and whenever they
change. The done event is always sent once after all parameters have been
sent. When a single parameter changes, all parameters are re-sent by the
compositor.
Compositors can re-send the parameters when the current client buffer
allocations are sub-optimal. Compositors should not re-send the
parameters if re-allocating the buffers would not result in a more optimal
configuration. In particular, compositors should avoid sending the exact
same parameters multiple times in a row.
The tranche_target_device and tranche_modifier events are grouped by
tranches of preference. For each tranche, a tranche_target_device, one
tranche_flags and one or more tranche_modifier events are sent, followed
by a tranche_done event finishing the list. The tranches are sent in
descending order of preference. All formats and modifiers in the same
tranche have the same preference.
To send parameters, the compositor sends one main_device event, tranches
(each consisting of one tranche_target_device event, one tranche_flags
event, tranche_modifier events and then a tranche_done event), then one
done event.
</description>
<request name="destroy" type="destructor">
<description summary="destroy the feedback object">
Using this request a client can tell the server that it is not going to
use the wp_linux_dmabuf_feedback object anymore.
</description>
</request>
<event name="done">
<description summary="all feedback has been sent">
This event is sent after all parameters of a wp_linux_dmabuf_feedback
object have been sent.
This allows changes to the wp_linux_dmabuf_feedback parameters to be
seen as atomic, even if they happen via multiple events.
</description>
</event>
<event name="format_table">
<description summary="format and modifier table">
This event provides a file descriptor which can be memory-mapped to
access the format and modifier table.
The table contains a tightly packed array of consecutive format +
modifier pairs. Each pair is 16 bytes wide. It contains a format as a
32-bit unsigned integer, followed by 4 bytes of unused padding, and a
modifier as a 64-bit unsigned integer. The native endianness is used.
The client must map the file descriptor in read-only private mode.
Compositors are not allowed to mutate the table file contents once this
event has been sent. Instead, compositors must create a new, separate
table file and re-send feedback parameters. Compositors are allowed to
store duplicate format + modifier pairs in the table.
</description>
<arg name="fd" type="fd" summary="table file descriptor"/>
<arg name="size" type="uint" summary="table size, in bytes"/>
</event>
<event name="main_device">
<description summary="preferred main device">
This event advertises the main device that the server prefers to use
when direct scan-out to the target device isn't possible. The
advertised main device may be different for each
wp_linux_dmabuf_feedback object, and may change over time.
There is exactly one main device. The compositor must send at least
one preference tranche with tranche_target_device equal to main_device.
Clients need to create buffers that the main device can import and
read from, otherwise creating the dmabuf wl_buffer will fail (see the
wp_linux_buffer_params.create and create_immed requests for details).
The main device will also likely be kept active by the compositor,
so clients can use it instead of waking up another device for power
savings.
In general the device is a DRM node. The DRM node type (primary vs.
render) is unspecified. Clients must not rely on the compositor sending
a particular node type. Clients cannot check two devices for equality
by comparing the dev_t value.
If explicit modifiers are not supported and the client performs buffer
allocations on a different device than the main device, then the client
must force the buffer to have a linear layout.
</description>
<arg name="device" type="array" summary="device dev_t value"/>
</event>
<event name="tranche_done">
<description summary="a preference tranche has been sent">
This event splits tranche_target_device and tranche_modifier events in
preference tranches. It is sent after a set of tranche_target_device
and tranche_modifier events; it represents the end of a tranche. The
next tranche will have a lower preference.
</description>
</event>
<event name="tranche_target_device">
<description summary="target device">
This event advertises the target device that the server prefers to use
for a buffer created given this tranche. The advertised target device
may be different for each preference tranche, and may change over time.
There is exactly one target device per tranche.
The target device may be a scan-out device, for example if the
compositor prefers to directly scan-out a buffer created given this
tranche. The target device may be a rendering device, for example if
the compositor prefers to texture from said buffer.
The client can use this hint to allocate the buffer in a way that makes
it accessible from the target device, ideally directly. The buffer must
still be accessible from the main device, either through direct import
or through a potentially more expensive fallback path. If the buffer
can't be directly imported from the main device then clients must be
prepared for the compositor changing the tranche priority or making
wl_buffer creation fail (see the wp_linux_buffer_params.create and
create_immed requests for details).
If the device is a DRM node, the DRM node type (primary vs. render) is
unspecified. Clients must not rely on the compositor sending a
particular node type. Clients cannot check two devices for equality by
comparing the dev_t value.
This event is tied to a preference tranche, see the tranche_done event.
</description>
<arg name="device" type="array" summary="device dev_t value"/>
</event>
<event name="tranche_formats">
<description summary="supported buffer format modifier">
This event advertises the format + modifier combinations that the
compositor supports.
It carries an array of indices, each referring to a format + modifier
pair in the last received format table (see the format_table event).
Each index is a 16-bit unsigned integer in native endianness.
For legacy support, DRM_FORMAT_MOD_INVALID is an allowed modifier.
It indicates that the server can support the format with an implicit
modifier. When a buffer has DRM_FORMAT_MOD_INVALID as its modifier, it
is as if no explicit modifier is specified. The effective modifier
will be derived from the dmabuf.
A compositor that sends valid modifiers and DRM_FORMAT_MOD_INVALID for
a given format supports both explicit modifiers and implicit modifiers.
Compositors must not send duplicate format + modifier pairs within the
same tranche or across two different tranches with the same target
device and flags.
This event is tied to a preference tranche, see the tranche_done event.
For the definition of the format and modifier codes, see the
wp_linux_buffer_params.create request.
</description>
<arg name="indices" type="array" summary="array of 16-bit indexes"/>
</event>
<enum name="tranche_flags" bitfield="true">
<entry name="scanout" value="1" summary="direct scan-out tranche"/>
</enum>
<event name="tranche_flags">
<description summary="tranche flags">
This event sets tranche-specific flags.
The scanout flag is a hint that direct scan-out may be attempted by the
compositor on the target device if the client appropriately allocates a
buffer. How to allocate a buffer that can be scanned out on the target
device is implementation-defined.
This event is tied to a preference tranche, see the tranche_done event.
</description>
<arg name="flags" type="uint" enum="tranche_flags" summary="tranche flags"/>
</event>
</interface>
</protocol>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment