Skip to content
Snippets Groups Projects

va: need to move common logic to create a va lib.

Merged He Junyan requested to merge He_Junyan/gst-plugins-bad:va_lib_support into master

As the va plugins become mature, we want to use it more. Is that possible to extract the common logic(display/memory/surface) from plugins to a lib? Just as the gst-plugins-bad/gst-libs/gst/d3d11 does.

The current common way of using vaapi is:

vaapixxxdec ! vaapixxxenc

or

msdkxxxdec ! msdkxxxenc

the linkages between vaapi and MSDK plugins are few because the DMA sharing support is not very perfect(For example, the different video formats and tilings).

When it comes to vaxxxdec plugins, we must connect vaxxxdec with msdkxxxenc, because the new va plugins does not support encoders and it seems that the encoder part will not be ready within a short period. So the common usage will be

vaxxxdec ! msdkxxxenc

in the future.

And on the new coming Intel's GPU(Gen12+), the "modifier" is a headache. The DMA sharing between va plugins and MSDK need a explicit "modifier" negotiation. And for Intel standalone GPU cards, it has multi GPU groups, and we need to make sure the vaxxxdec and msdkxxxenc run on the same device and GPU group, it is also easy to make mistake. Sharing the VADisplay and VASurface can avoid this.

Another important point is that when we want to extend our support of gstreamer on to windows, we do not find a good way to connect the d3d11xxxdec with the msdk plugins. There is no such a counterpart of the DMA thing on Windows. But fortunately, we find there is already a d3d11 libs existing.

So, our ideal target is: Using vaxxxdec ! msdkxxxenc on Linux, linking them with VAMemory caps and share VADisplay/VASurface.

And using d3d11xxxdec ! msdkxxxenc on Windows, linking them with divx surface caps(not very accurate, need to ask @seungha.yang).

Then, it is symmetric. And so we may want this va lib. This lib can also be used by other modules and customers such as deep learning projects, which is still in development and just want to use our VA surface quickly.

Edited by He Junyan

Merge request reports

Merge request pipeline #321350 passed

Merge request pipeline passed for d09aae68

Merged by GStreamer Marge BotGStreamer Marge Bot 3 years ago (May 18, 2021 12:47pm UTC)

Loading

Pipeline #321431 waiting for manual action

Pipeline waiting for manual action for d09aae68 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Developer

    We can not do that before, because the gstreamer-vaapi is outside of gst-plugins-bad, and so the MSDK in bad can not refer to that lib reversely. But they are all in -bad now.

  • He Junyan changed the description

    changed the description

  • First of all, I have no objection about libgstva as long as it's -bad scope library at the moment (like libgstd3d11), and I believe libgstva would give us a chance to improve that API design as well.

    Another important point is that when we want to extend our support of gstreamer on to windows

    That sounds really good plan! AFAIK MSDK supports d3d very well on Windows and other open source projects (OBS for example) use it already. That's what I really wanted feature, and I was looking forward contribution from Intel people, But

    I have no plan to integrate d3d11 with other APIs (MSDK for Intel, and CUDA/NVCODEC for nvidia) because:

    • d3d11 + MediaFoundation works better than gstreamer MSDK (and GSTMFX) already as per my test, in terms of stability and performance. I feel Intel supports d3d11 and MediaFoundation very well
    • d3d11 + MediaFoundation covers all well-known vendors, Intel, NVIDIA, AMD, Qualcomm
    • CUDA + d3d11 interop overhead is big, sometimes it's slower than d3d11/MedaiFoundation. So I have no motivation to do that yet.
    • non-d3d11 Graphics APIs are not allowed for UWP

    In short, I prefer native Windows API (d3d11/MediaFoundation) over vendor specific APIs.

    There are some points where d3d11/MediaFoundation doesn't cover but native APIs (MSDK/CUDA) do though, I'm focusing on d3d/Mediafoundation because of the reasons I mentioned above.

    I hope Intel people take a look at Windows GStreamer things :blush:

    • Resolved by He Junyan

      @seungha.yang thanks for the feedback. MediaFoundation can only use a subset of MediaSDK. This why we need a MediaSDK plugin.

      I have no plan to integrate d3d11 with other APIs

      We can do this for gstmsdk. As the first step, we can make gst-va shamelessly work with gstmsdk, then we will look at gst-dxva. We still have many gaps, like gstmsdk d3d allocator is not implemented. Let us fix it step by step.

  • As far as I understand this library will only expose the GstVaDisplay structure and its methods required for creating and sharing it among the pipeline via GstContext.

    For VASurfaces, there's already a method to get them with no need of special API.

    Also, there's no need to expose bufferpools or allocators, if I understand correctly.

  • We still need gstmsdk encoder to provide allocator and buffer pool, right? and in some cases like "decode ! tee name=t ! encoder1 t. ! encoder2", tee will not ask encoder's allocator. So the gstmsdk need accept upstream allocated buffer too. Also, I am not sure in the decode + tee + encoder case, the decode can get the encoder's GstContext or not.

  • Author Developer

    @vjaquez , if really do not want to creating the lib, the one way is encapsulating the "display" handle inside some miniobject, and pass this miniobject in GstContext between modules. This can trace the live time of the display handle. If we only export the display handle in the GstContext, just like "gst.va.display.handle", the va plugins may close that display while others are using it.

    I prefer to expose the display, vamemory and va mem pool. And some utils such as context query and setting.

    The GST_MAP_VA can really work, but we still need to do some check, such as "GST_VA_ALLOCATOR (mem->allocator)" before we map it.

  • I believe a lib to make the context sharing is a good start. In general, I'd focus on keeping the API as minimal as possible. To answer some of @XuGuangxin questions, for a pipeline like:

    graph LR
      subgraph vaXYZdec
        VD_sink[sink]
        VD_src[src]
        VD_sink -.- VD_src
      end
    
      subgraph tee
        T_sink[sink]
        T_src0[src0]
        T_src1[src1]
        T_sink -.- T_src0
        T_sink -.- T_src1
      end
    
      subgraph msdkXYZenc0
        ME0_sink[sink]
        ME0_src[src]
        ME0_sink -.- ME0_src
      end
    
      subgraph msdkXYZenc1
        ME1_sink[sink]
        ME1_src[src]
        ME1_sink -.- ME1_src
      end
    
      VD_src -- "video/x-raw(memory:VASurface)" ---> T_sink
      T_src0 -- "video/x-raw(memory:VASurface)" ---> ME0_sink
      T_src1 -- "video/x-raw(memory:VASurface)" ---> ME1_sink

    The context sharing, assuming this will be done GstContext way will work in a way that the app is asked first through a sync message (unless a context is cached in the parent bin), and other wise the neighbour are queries for a context. If a known context is found, it's used or wrapped (depending on the stack really) so that zero-copy buffer sharing will be possible.

    Now, most of the work with memory:VASurface happens through caps, so my accepting these caps, you also accepts that you can deal with the VASurface, with of without a usable shared context. In D3D11 it was noted that it means we need to be aware of the context the D3D11 textures were created with, so you can in the worst case resort to a full download/upload roundtrip. Same seems needed here I believe. This is very voiler place stuff that should go into the lib.

    Now, let's say the context is compatible, you have to deal with VASurface allocation. As usual, the decoder will use an allocation query to retrieve downstream information. The outcome could have been a usable buffer pool (figure-out if the pool is usable memory:VASurface can be done through a pool feature implemented in the lib). But for the tee case, you will not get any pool. You will have to resort to allocation APIs. As of current implementation, tee will only keep the APIs that exist on all branches (or legs).

    Now, for this case, just the API type would not be enough. I believe you will need some extra information (modifiers list would be an options, it could also be be some VA specific hint, remember this has to be VA specific). The only oops is that tee does not know yet how to merge allocation API parameters. I need to make this happen for memory:DMABuf modifiers, so just bug me about it when you get there and I can make this happen.

    This is far from a complete plan, but it gives an idea all the boilerplate needed and how a shared library can help.

    Edited by Nicolas Dufresne
  • And using d3d11xxxdec ! msdkxxxenc on Windows, linking them with divx surface caps

    Regarding this one, I'd say DXGI/D3D11 surface/texture sharing will work only for one physical device (I haven't tested other cases like amd crossfire nvidia sli).

    Another note is that, d3d11*dec doesn't use downstream buffer pool (always uses its internal pool for decoding) because of the DXVA API design. Downstream d3d11-compatible buffer pool will be used only for

    • reverse playback
    • or internal DPB pool is about to full

    More note: I implemented MediaFoundation(MF) + d3d11 integration layer so that MF can copy incoming d3d11 texture into MF's own texture pool because of performance reason (intra-GPU copying overhead is sometimes smaller than synchronize overhead, especially in case of NVIDIA) https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/blob/master/sys/mediafoundation/gstmfvideoenc.cpp#L1014-1086

  • @ndufresne thanks for the information. We will ping you when we have an issue.

    @seungha.yang , do you have a plan to use the downstream pool? Zero copy is important for large resolution.

  • I have no plan to use downstream pool because of special requirement for DPB texture.

    Btw, I think MSDK should be able to wrap GstD3D11Memory into MSDK form and accept it even if it's not allocated by MSDK?

    • Author Developer
      Resolved by He Junyan

      I think that may be more or less like the MSDK using VAMemory, the MSDK can provide no pool to upstream element, but it can still recognize and use the GstD3D11Memory/VAMemroy allocated by upstream element. This way does not need a copy.

  • He Junyan added 26 commits

    added 26 commits

    Compare with previous version

  • He Junyan marked this merge request as ready

    marked this merge request as ready

  • He Junyan changed title from [RFC] WIP: va: we need to create a va lib. to va: need to move common logic to create a va lib.

    changed title from [RFC] WIP: va: we need to create a va lib. to va: need to move common logic to create a va lib.

  • Author Developer

    I have completed the first version that can make the pipeline such as:

    gst-launch-1.0 -vf filesrc location=1920x1080.h264 ! h264parse ! vah264dec ! video/x-raw\(memory:VAMemory\) ! msdkh264enc ! fakesink

    work well. And the encode result is correct.

    I find that we may really need to move the va display logic into a lib to make the code clean.

    What I do here is:

    1. Move the VA display logic into the va lib. We do not install its headers now because it is only used by va and MSDK inside -bad.
    2. Implement the MSDK's context using the common VA display. Then, all the va related plugins can share the same va display by GstContext.
    3. Import VA surface as VAMemory and MSDK can directly use it.

    If the idea is correct, We may need to split this into two MR later, one for this lib, and another one for MSDK.

  • You are so efficent. How about this pipeline? gst-launch-1.0 -vf filesrc location=1920x1080.h264 ! h264parse ! vah264dec ! msdkh264enc ! fakesink

  • Author Developer

    The exact same result. @vjaquez make the VAMemory the first choice of the caps. So once the down stream element reports VAMemory caps, it will be used.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading