Skip to content

Draft: kms: New element kmscompositor

Benjamin Desef requested to merge projekter/gstreamer:kmscompositor into main

This merge request implements a compositor that uses the KMS layer (writeback connectors) to build its output. It it therefore extremely quick, although a bit more limited than software compositors. On the Raspberry Pi (at least 4, don't know about 5), this is the only way that I know of to compose multiple sources including alpha.

Note that this is the first time that I'm writing internal GStreamer code or dealing with KMS at all (or DMA, for that matter; or even anything related to Linux, or GObject). So any kind of suggestions/improvements are more than welcome, there's probably a lot that I've done wrong or suboptimally. Great acknowledgement and inspiration comes from the kmssink and compositor plugins obviously, and Daniel Egnor for pivid for all the atomic transaction part.

This is marked as a draft as there are a couple of things that should be discussed before considering a merge:

  • I have put the new element to the former kmssink plugin, which I have now renamed to kms, I hope this was the correct place to put things (perhaps not also logically but license-wise)?
  • There are some functions that this element shares with kmssink. I have moved a couple of them to gstkmsutils.c (kms_open, log_drm_version, get_drm_caps), though there could be some more - ..._create_pool, ..._validate_and_set_external_fd, ..._invalidate_external_fd, ..._copy_to_dumb_buffer, ...import_dmabuf, ..._get_input_buffer. However, they all access properties of the sink/compositor (though for the compositor, some are pad-related), so these properties all have to be given as references, which is annoying - or we define a common superclass for KMS components, which I think would be the most clean approach.
  • It would be great if the output of the compositor could also be displayed on the screen. Naturally, one would put a kmssink after the kmscompositor (potentially with a tee). However, this is problematic.
    • For one, opening the driver: due to the exclusive nature, kmssink and kmscompositor cannot both open it. Since they both have the fd property, it would be possible for the user to get the file descriptor and set it for both of them, though this is a bit inconvenient. Can there be a global state where the KMS components try to get their fd? But what if there are multiple cards and the user deliberately want the components to be on different ones - then, there would have to be a global dictionary that assigns fds to card ids.
    • Passing data between kmscompositor and kmssink. If the compositor were to enable a KMSMemory feature, this would be optimal. However, if the source and sink operate on different cards, then a buffer copy would have to be necessary - so not all KMSMemorys are created equally.
    • kmscompositor uses atomic transactions. kmssink uses the legacy interface (perhaps it is also time to migrate kmssink?). Mixing them might not be a good idea, if it even works.
    • While reading on all this stuff, I found a thread that reveals problems when using multiple CRTCs, which here would be necessary - one is the writeback, the other one the output. The solution/workaround there was to do everything in the same atomic commit. Therefore, kmscompositor would also have to do the displaying part - which in principle is not a problem and not hard to add. But this breaks gstreamer's logic that the kmscompositor is a VideoAggregator and not a sink.
    • To optimize things further, what if the user wants to compose and display, but has no interest in processing the data further? Then, writing everything to a writeback buffer seems wasteful.
    • So currently, kmscompositor is a VideoAggregator with one ALWAYS source pad. It would be possible to add a property display (or better, something that indicates the CRTC where to display) to enable the output. Should then the source pad be turned in a REQUEST pad (of which only one is allowed), and only if the user requests output will the writeback mechanism actually be used? And if not, the compositor is basically a fakesink or an actual sink? This would be efficient, but not very gstreamer-like.
  • I was interested in streaming the output, and for this purpose, I linked my kmscompositor to a v4l2h264enc. Of course, this should work with zero copy, as by using KMS Prime, kmscompositor can export a DMABuf, and v4l2 can import this. However, how can I actually implement this? For GL and va, the components reveal that they support the DMABuf format, so automatic negotiation is easy. However, the v4l2 components don't have this feature in their caps, and in fact, if they are fed with data that has it, the link won't even work. So my current workaround is to add a property force_dma that forcibly sends a DMABuf output without declaring this in the caps feature. Is there any nicer way to do this through some kind of negotiation mechanism? If so, please provide a commit.
  • The most critical design decision. KMSCompositor can compose a certain number of inputs which is dictated by the number of planes that are assigned to the writeback connector. This is hardware-dependent. And as I understand it, the properties that these planes may have can even vary from plane to plane (e.g., the supported pixel format has to be queried giving an ID of the plane; planes may or may not have a writable zpos property - on the Pi, the primary plane has read-only zpos, while the overlay planes are writable). How to make sense of this strong hardware-dependence on the GStreamer side?
    • The current implementation is a weird experiment. The kmscompositor only has SOMETIMES sink pads, which are created when the element is started. This perfectly reflects the hardware-dependence, but is not nice regarding gst-launch, as SOMETIMES sink pads are not supported. Therefore, the code currently cannot be tested on the command line, only programmatically.
    • While gst-launch could be improved - and it would be easy to do so unless a SOMETIMES source is linked with a SOMETIMES sink -, perhaps this level of hardware-dependence is not very desirable from the gstreamer perspective. In particular, the Raspberry Pi has a lot of planes, so the compositor would end up with more than ten sink pads (currently, in the code I simply expose the first two planes to do some experiments, as I didn't want to invest more time in a proper mechanism before hearing your opinion on what would be the best way). Due to the different properties of the planes, this cannot be limited beforehand by a property, so, similar to decodebin3, an event in which the user can decide whether to expose the given pad would be the best way to deal with this.
    • But then, kmscompositor is designed for its flexibility - you can easily change all properties of the pads (positions, dimensions, alpha, zpos) while the element is running. What if the user wants to add one more pad? The event is already long gone, so there's no way to do this.
    • This suggests that perhaps REQUEST sink pads would be a better option. Currently, a pad is identified with a plane, and this association stays until the element stops. With REQUEST pads, this is no longer an option. The kmscompositor would internally have to keep a list of all planes and what they can do and then assign the pads to planes in each aggregation (which is a bit costly, always matching caps - and how to even do this in an optimal way? What if there is one source pad whose caps will only match one particular plane, but all the others are compatible with a lot more planes - if one of the other planes is assigned first, the finnicky one cannot be assigned at all. Dealing with this requires a lot of effort). Or they are assigned a plane id in an aggregation once, which is then stored and kept until the pad is discarded (or probably until a renegotiation on this pad happens; or until new pads are requested or released).
  • There are a couple of things that I simplified compared to kmssink.
    • For one HDR. No idea about this, and I just wanted to get the compositor running. Maybe this can be integrated again.
    • Then, the polling for VBLANK. It was my impression that in an aggregator, the frame aggregation method should return when the aggregation has finished. For writeback, this would just mean that the callback fd signals that it has data. However, it still appears to be necessary to do a page flip event and wait for the VBLANK. If I don't do this, the rendering becomes very slow. But then, is there any reason to favor the gpoll approach that kmssink uses over actively reading from the file descriptor (which is copied from pivid)?
    • Display/pixel ratios. Given that the writeback connector has no physical size, I don't probe it. The question is what to do when kmscompositor should act as a screen renderer also. I would just say to drop querying the physical dimensions (I was very puzzled when I figured out what kmssink does - although it makes sense, I would not have expected this from a renderer. If the pixels are nonsquare, well, choose your resolution appropriately or face the consequences...)
  • What to do about the reported framerate? I copied the compositor approach, but just giving 25fps as a default seems a bit...helpless to me. See the source code comment.
  • The properties of the kmscompositor itself (not the pads) should not be changed while running. However, I did not implement such a check - is it common in gstreamer elements? kmssink doesn't check for it.
  • I currently overwrite the zorder property (which is the only one that is not called as the kms property, which is zpos - due to the fact that the VideoAggregatorPad already provides such a property), but duplicate it. Is it possible/recommendable to use the backing of the parent property (which is private)? I guess that for efficiency reasons, I should still keep an own copy around, but the setter should call the parent's setter?

Test code:

import gi, time
gi.require_version("Gst", "1.0")
from gi.repository import Gst

# use two of the sources and one of the sinks (first sink requires a server such as Janus)
pipe = Gst.parse_launch(
    " ".join([
        " ! ".join([
            "v4l2src device=/dev/video0",
            "capsfilter caps=video/x-raw,format=RGB16 name=capsfilter"
        # " ! ".join([
        #     "videotestsrc",
        #     "capsfilter caps=video/x-raw,width=1280,height=720,format=RGB16 name=capsfilter"
        # ]),
        " ! ".join([
            "capsfilter caps=video/x-raw,width=1280,height=720,format=RGB16 name=capsfilter2"
        " ! ".join([
            "kmscompositor name=kmscompositor force-dma=true",
            "queue min-threshold-buffers=1",
            'v4l2h264enc output-io-mode=5 extra-controls="controls,h264_profile=3,video_bitrate=1500000,h264_i_frame_period=100,video_bitrate_mode=0,repeat_sequence_header=1;"',
            "udpsink host= port=8004"
        # " ! ".join([
        #    "kmscompositor name=kmscompositor force-dma=true",
        #    "video/x-raw,format=RGB,width=1280,height=720",
        #    "v4l2convert",
        #    "fbdevsink"
        # ])
kmscompositor = pipe.get_by_name("kmscompositor")
capsfilter = pipe.get_by_name("capsfilter")
capsfilter2 = pipe.get_by_name("capsfilter2")
islinked = 0
def pad_added(element: Gst.Element, pad: Gst.Pad):
    global islinked
    if pad.direction is Gst.PadDirection.SINK:
        print("Got a new pad", flush=True)
        if islinked == 0:
            global capsfilter
            islinked = 1
        elif islinked == 1:
            global capsfilter2
            islinked = 2
            print("But no couterpart")
kmscompositor.connect("pad-added", pad_added)


time.sleep(2) # just to show that you can indeed dynamically change the properties at any time, though here you should at least wait until the pipeline has completed the state transition

# you are free to experiment with the properties
kmscompositor.sinkpads[1].set_property("alpha", .75)
kmscompositor.sinkpads[1].set_property("xpos", 50)
kmscompositor.sinkpads[1].set_property("ypos", 50)
kmscompositor.sinkpads[1].set_property("width", 320)
kmscompositor.sinkpads[1].set_property("height", 180)

Caveat: This test code does not appear to be working any more. I did all my developments on the 1.22 branch, where this all worked fine. Then, in preparation for this merge request, I ported it to the main branch. But there currently seems to be an issue with videotestsrc: gst-launch-1.0 videotestsrc ! kmssink gives Illegal instruction. So I'm not too bothered that the code above crashes the kernel as soon as I use the videotestsrc; when I reduce the number of input pads to 1 again and just use my external camera, everything is fine again.

Unrelated closing remark: In preparation for this merge request, I tried to look for a coding styleguide and ended up at this documentation, where I was very annoyed to have to change everything to C89. Right now, I saw that on a different documentation page, the target now seems to be C99, making all of the changes superfluous. So that's the reason why the merge request is now in C89 (apart from the one variable-sized array that I didn't want to change - the compiler warning should probably be disabled if C99 actually is the target now).

Note that my development time currently is extremely limited, mostly to the weekends. So feel free to add changes by yourselves if I'm not very responsive during the week.

Merge request reports