[RFC] Introduce Underlay allocation strategy and API
Intention
- To introduce and discuss a new API for libliftoff underlay support
- To explain the what and why of the underlay strategy
- To discuss the allocation algorithm for underlay
Abstract: A Quieter GPU
What problem is libliftoff trying to solve? One of it's goals is to "quiet" power-hungry GFX cores by offloading composition to the display controller where possible. This can lead to power savings, and reduction in latency, by allowing the display controller to directly scan out frame buffers. Doing so optimally is the name of the game; layers that incur the most GFX traffic should be offloaded. However, this isn't the case with libliftoff today due to limitations in it's overlay allocation strategy. Here, a proposal is given to introduce the underlay strategy as a solution.
Background
But first, some background: How can display controllers offload GFX anyways?
Many display controllers today have hardware planes. These HW planes can do cropping, transformations, color processing, and blending (though support varies across HW vendors), much of the same things that GFX is commonly used for in composition. So rather than running shaders, hardware planes can be used to directly scan out buffers for composition. This offloads GFX work onto the display hardware.
How do we know that offloading to HW planes will quiet GFX? Typically, the output layers that are updating most frequently will incur the most GFX traffic. Therefore, they become "high prioriy" candidates for libliftoff to offload, since they have the highest chance of quieting GFX traffic.
GFX is still a friend
However, not all layers can be offloaded due to various constraints. In the
event that a layer cannot be offloaded, we cannot simply discard them; the
option to fallback to GFX composition is needed. Libliftoff handles this by
designating a layer as the composition layer. Layers that cannot be allocated
to a HW plane will be GFX composited onto it. The composition layer itself is
allocated to DRM_PLANE_TYPE_PRIMARY
, which all HW vendors support. This makes
it so that layers that cannot be offloaded can still be rendered, with the worst
case being that all layers are GFX composited onto the composition layer.
The overlay strategy
Naturally, the sensible approach to offload layers is to allocate them to HW planes over the primary plane. This is indeed what libliftoff does today, and it is referred to as the overlay allocation strategy. However, it is not without challenges.
In the overlay strategy, the composition layer is allocated to the DRM_PRIMARY plane. Offload-able (or direct-scanout-able) layers are allocated to DRM_OVERLAY/CURSOR planes, with respect to constraints such as zpos and intersection. Since DRM_OVERLAY/CURSOR plans have higher zpos than DRM_PRIMARY, direct-scanout-able layers are allocated over the composition layer.
Now if a high priority layer l_a
is obscured by another layer l_b
, then
there is a dependency: In order to offload l_a
, l_b
needs to be offloaded
first. If it happens that l_b
is direct-scanout-able, and there are enough
compatible HW planes, then we're in luck, and offloading l_a
is possible.
However, if l_b
is not direct-scanout-able, is itself obscured by another
layer, or there aren't enough compatible HW planes, then we are out of luck. We
cannot offload l_a
despite it being a high priority layer.
The situations where this occur is quite frequent. Surfaces can overlap whenever UI elements are rendered over each other. Some of them are even shm buffers, which are not direct-scanout-able. A concrete example is MPV playing a video, with OSD (ui, subtitles, and is a shm buffer) enabled.
However, this does not have to be the case. The underlay strategy, although counter intuitive, allows any fully opaque layer to be offloaded, regardless of it's zpos and intersection with other layers.
The Underlay strategy & Punch-Through support
Underlay refers to a strategy where direct-scanout planes are placed under the composition plane rather than over.
In the underlay strategy, the composition layer is allocated to a DRM_OVERLAY plane. Direct-scanout layers are allocated to DRM planes of lower zpos, which can be another DRM_OVERLAY, or DRM_PRIMARY. In other words, direct-scanout layers are allocated under the composition layer.
In order for the underlay to show through, compositors need to "punch" a transparent hole through the composition layer with the same bounding-box as the underlay.
Advantages
- Direct-scanout of layers can occur regardless of their zpos and intersection with other layers.
- Consequently, precious hw planes can be used for high priority layers directly, rather than for layers obscuring it (as is the case with overlay).
- Results in better quieting of gfx, and hence better power/performance, by:
- Guaranteeing the offload of a high-priority layer, given that it can be directly scanned out by hardware.
- Offload remains active regardless of updates in higher-zpos layers
Disadvantages
- Underlay layers must be fully opaque
- Unless there exists another underlay layer with lower zpos, with identical bounding box (i)
- Punch-through composition introduces additional complexity to compositors
Example
Consider a video playback scenario: Both the video and UI/OSD can be directly scanned out, but there's a miscellaneous surface with a higher zpos that can't. A compositor may construct layers as follows:
With the overlay strategy, libliftoff can provide the following allocation. Since misc. cannot be directly scanned out, it is composited with the background:
However, if misc. intersects with other layers, then an overlay allocation is not possible. We need to fallback to full GFX composition:
With the underlay strategy, the composition plane is positioned above the video and ui/osd planes. To allow them to show through, the compositor first "punches" a 0-alpha (full transparent) hole where the video should be. Then, composition with the misc. layer happens as usual, while still permitting direct-scanout of the video/ui/osd:
By extension, this strategy allows direct-scanout even if the ui/osd layer requires composition for whatever reason:
API changes for Underlay Support
struct liftoff_device *
liftoff_device_create(int drm_fd, bool punchthru_supported);
New punchthru_supported
flag allows compositors to report to libliftoff that
it can "punch through" the composition layer for underlay planes support. If
true, libliftoff will allocate layers to planes using the underlay allocation
algorithm. Otherwise, the overlay-only algorithm will be used.
bool
liftoff_layer_is_underlay(struct liftoff_layer *layer);
New function for compositors to check if a layer has been assigned to an
underlay plane. Will always return false if the layer needs composition (i.e.
liftoff_layer_needs_composition() == true
).
If not all layers are allocated (i.e. liftoff_output_needs_composition() == true
), compositors can use this to identify layers that require a punch-through
hole on the composition layer.
Underlay allocation algorithm
For the initial implementation, a simple underlay-only algorithm is proposed. Though ideally, both underlay and overlay allocations should be considered, such that the highest-priority layers can be offloaded regardless of their opacity and zpos.
The workings are essentially the same as the overlay algorithm. The differences are: For the initial implementation, a simple underlay-only algorithm is proposed. Though ideally, both underlay and overlay allocations should be considered, such that the highest-priority layers can be offloaded regardless of their opacity and zpos.
The workings are essentially the same as the overlay algorithm. The differences are:
- Allocation of the composition layer
-
Overlay: To
DRM_PRIMARY
at all times -
Underlay: Initially to
DRM_PRIMARY
, but moved to the highest-zposDRM_OVERLAY
when the first underlay layer is allocated.
-
Overlay: To
- Allocation of planes
- Overlay: Depth-first-search on each plane by descending zpos (highest-zpos plane first)
- Underlay: Depth-first-search on each plane by ascending zpos (DRM_PRIMARY first)
- Allocation zpos criteria:
- Overlay: a layer can only be allocated if all intersecting and higher-zpos layers have been allocated
- Underlay: a layer can only be allocated if no intersecting and higher-zpos layers have been allocated
- Allocation opacity criteria:
- Overlay: a layer's opacity does not matter
- Underlay: a layer can only be allocated if it is fully opaque
- Cursor allocation:
- Overlay: treat cursor the same as any other layer
-
Underlay: allocate specifically to
DRM_CURSOR