Brainstorming: Optimizing for hardware planes
Hey everyone.
I've been doing some brainstorming with hardware planes. As you probably know, a lot of GPUs have the ability to do simple image blending with the help of fixed-function hardware. The way this works is by exposing several planes, which are then blended before being sent to the display.
The number of planes available varies by hardware, but there can be quite a lot to take advantage of. For example, my rpl-p gen12 intel igpu has 7 planes + 1 cursor plane.
Since hardware planes are generally less of a power drain, it's probably a good idea to try to use as many as possible, while attempting to prioritize the content which benefits the most from them.
Here's a simple concept of what something like this could look. There's a few programs open which provide some buffers, and we want to composite something for our monitor with their buffers.
There are 6 usable planes in total, so we begin by assigning each single buffer to its own hardware plane. Eventually, we come across a program which has more buffers than overlay planes available. A quick solution to this problem is blending this program's buffers and scanning the result out to a single overlay.
This isn't perfect though, mainly because it always gives "priority" to the foreground buffers. This isn't necessarily the best approach as background buffers could actually be content types which are updated more often, such as a video stream or a videogame output. A solution to this would be to prioritize the assignment of certain kinds of contents to their own planes:
In this example, there's only 3 planes to work with. We know the video stream will update more often, so we give it priority and assign it one of the planes. The remaining non-priority buffers are grouped into background and foreground and then blended + scanned out into the two remaining planes. This ensures that the video can continue to play without requiring further 3d engine usage.
There's a lot of possibilities here, so I'd love to get some feedback on the concept. I'd like to understand if maximizing hardware plane usage is a good idea, as well as what kind of choices the compositor should be doing to ensure optimal power savings.
I'm thinking about writing a simple algorithm to prototype with this idea. It'll try to maximize the use of hardware planes and minimize the amount of required redraws for a N amount of buffers (with different types) and a N amount of planes.