va: need to move common logic to create a va lib.
As the va plugins become mature, we want to use it more. Is that possible to extract the common logic(display/memory/surface) from plugins to a lib? Just as the gst-plugins-bad/gst-libs/gst/d3d11 does.
The current common way of using vaapi is:
vaapixxxdec ! vaapixxxenc
or
msdkxxxdec ! msdkxxxenc
the linkages between vaapi and MSDK plugins are few because the DMA sharing support is not very perfect(For example, the different video formats and tilings).
When it comes to vaxxxdec plugins, we must connect vaxxxdec with msdkxxxenc, because the new va plugins does not support encoders and it seems that the encoder part will not be ready within a short period. So the common usage will be
vaxxxdec ! msdkxxxenc
in the future.
And on the new coming Intel's GPU(Gen12+), the "modifier" is a headache. The DMA sharing between va plugins and MSDK need a explicit "modifier" negotiation. And for Intel standalone GPU cards, it has multi GPU groups, and we need to make sure the vaxxxdec and msdkxxxenc run on the same device and GPU group, it is also easy to make mistake. Sharing the VADisplay and VASurface can avoid this.
Another important point is that when we want to extend our support of gstreamer on to windows, we do not find a good way to connect the d3d11xxxdec with the msdk plugins. There is no such a counterpart of the DMA thing on Windows. But fortunately, we find there is already a d3d11 libs existing.
So, our ideal target is: Using vaxxxdec ! msdkxxxenc on Linux, linking them with VAMemory caps and share VADisplay/VASurface.
And using d3d11xxxdec ! msdkxxxenc on Windows, linking them with divx surface caps(not very accurate, need to ask @seungha.yang).
Then, it is symmetric. And so we may want this va lib. This lib can also be used by other modules and customers such as deep learning projects, which is still in development and just want to use our VA surface quickly.
Merge request reports
Activity
First of all, I have no objection about
libgstva
as long as it's -bad scope library at the moment (like libgstd3d11), and I believelibgstva
would give us a chance to improve that API design as well.Another important point is that when we want to extend our support of gstreamer on to windows
That sounds really good plan! AFAIK MSDK supports d3d very well on Windows and other open source projects (OBS for example) use it already. That's what I really wanted feature, and I was looking forward contribution from Intel people, But
I have no plan to integrate d3d11 with other APIs (MSDK for Intel, and CUDA/NVCODEC for nvidia) because:
- d3d11 + MediaFoundation works better than gstreamer MSDK (and GSTMFX) already as per my test, in terms of stability and performance. I feel Intel supports d3d11 and MediaFoundation very well
- d3d11 + MediaFoundation covers all well-known vendors, Intel, NVIDIA, AMD, Qualcomm
- CUDA + d3d11 interop overhead is big, sometimes it's slower than d3d11/MedaiFoundation. So I have no motivation to do that yet.
- non-d3d11 Graphics APIs are not allowed for UWP
In short, I prefer native Windows API (d3d11/MediaFoundation) over vendor specific APIs.
There are some points where d3d11/MediaFoundation doesn't cover but native APIs (MSDK/CUDA) do though, I'm focusing on d3d/Mediafoundation because of the reasons I mentioned above.
I hope Intel people take a look at Windows GStreamer things
- Resolved by He Junyan
@seungha.yang thanks for the feedback. MediaFoundation can only use a subset of MediaSDK. This why we need a MediaSDK plugin.
I have no plan to integrate d3d11 with other APIs
We can do this for gstmsdk. As the first step, we can make gst-va shamelessly work with gstmsdk, then we will look at gst-dxva. We still have many gaps, like gstmsdk d3d allocator is not implemented. Let us fix it step by step.
As far as I understand this library will only expose the GstVaDisplay structure and its methods required for creating and sharing it among the pipeline via GstContext.
For VASurfaces, there's already a method to get them with no need of special API.
Also, there's no need to expose bufferpools or allocators, if I understand correctly.
We still need gstmsdk encoder to provide allocator and buffer pool, right? and in some cases like "decode ! tee name=t ! encoder1 t. ! encoder2", tee will not ask encoder's allocator. So the gstmsdk need accept upstream allocated buffer too. Also, I am not sure in the decode + tee + encoder case, the decode can get the encoder's GstContext or not.
@vjaquez , if really do not want to creating the lib, the one way is encapsulating the "display" handle inside some miniobject, and pass this miniobject in GstContext between modules. This can trace the live time of the display handle. If we only export the display handle in the GstContext, just like "gst.va.display.handle", the va plugins may close that display while others are using it.
I prefer to expose the display, vamemory and va mem pool. And some utils such as context query and setting.
The GST_MAP_VA can really work, but we still need to do some check, such as "GST_VA_ALLOCATOR (mem->allocator)" before we map it.
I believe a lib to make the context sharing is a good start. In general, I'd focus on keeping the API as minimal as possible. To answer some of @XuGuangxin questions, for a pipeline like:
graph LR subgraph vaXYZdec VD_sink[sink] VD_src[src] VD_sink -.- VD_src end subgraph tee T_sink[sink] T_src0[src0] T_src1[src1] T_sink -.- T_src0 T_sink -.- T_src1 end subgraph msdkXYZenc0 ME0_sink[sink] ME0_src[src] ME0_sink -.- ME0_src end subgraph msdkXYZenc1 ME1_sink[sink] ME1_src[src] ME1_sink -.- ME1_src end VD_src -- "video/x-raw(memory:VASurface)" ---> T_sink T_src0 -- "video/x-raw(memory:VASurface)" ---> ME0_sink T_src1 -- "video/x-raw(memory:VASurface)" ---> ME1_sink
The context sharing, assuming this will be done GstContext way will work in a way that the app is asked first through a sync message (unless a context is cached in the parent bin), and other wise the neighbour are queries for a context. If a known context is found, it's used or wrapped (depending on the stack really) so that zero-copy buffer sharing will be possible.
Now, most of the work with memory:VASurface happens through caps, so my accepting these caps, you also accepts that you can deal with the VASurface, with of without a usable shared context. In D3D11 it was noted that it means we need to be aware of the context the D3D11 textures were created with, so you can in the worst case resort to a full download/upload roundtrip. Same seems needed here I believe. This is very voiler place stuff that should go into the lib.
Now, let's say the context is compatible, you have to deal with VASurface allocation. As usual, the decoder will use an allocation query to retrieve downstream information. The outcome could have been a usable buffer pool (figure-out if the pool is usable memory:VASurface can be done through a pool feature implemented in the lib). But for the tee case, you will not get any pool. You will have to resort to allocation APIs. As of current implementation, tee will only keep the APIs that exist on all branches (or legs).
Now, for this case, just the API type would not be enough. I believe you will need some extra information (modifiers list would be an options, it could also be be some VA specific hint, remember this has to be VA specific). The only oops is that tee does not know yet how to merge allocation API parameters. I need to make this happen for memory:DMABuf modifiers, so just bug me about it when you get there and I can make this happen.
This is far from a complete plan, but it gives an idea all the boilerplate needed and how a shared library can help.
Edited by Nicolas DufresneAnd using d3d11xxxdec ! msdkxxxenc on Windows, linking them with divx surface caps
Regarding this one, I'd say DXGI/D3D11 surface/texture sharing will work only for one physical device (I haven't tested other cases like amd crossfire nvidia sli).
Another note is that, d3d11*dec doesn't use downstream buffer pool (always uses its internal pool for decoding) because of the DXVA API design. Downstream d3d11-compatible buffer pool will be used only for
- reverse playback
- or internal DPB pool is about to full
More note: I implemented MediaFoundation(MF) + d3d11 integration layer so that MF can copy incoming d3d11 texture into MF's own texture pool because of performance reason (intra-GPU copying overhead is sometimes smaller than synchronize overhead, especially in case of NVIDIA) https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/blob/master/sys/mediafoundation/gstmfvideoenc.cpp#L1014-1086
@ndufresne thanks for the information. We will ping you when we have an issue.
@seungha.yang , do you have a plan to use the downstream pool? Zero copy is important for large resolution.
- Resolved by He Junyan
I think that may be more or less like the MSDK using VAMemory, the MSDK can provide no pool to upstream element, but it can still recognize and use the GstD3D11Memory/VAMemroy allocated by upstream element. This way does not need a copy.
added 26 commits
-
c51b2bcf...4900e358 - 25 commits from branch
gstreamer:master
- 8e9275e9 - first version can work
-
c51b2bcf...4900e358 - 25 commits from branch
I have completed the first version that can make the pipeline such as:
gst-launch-1.0 -vf filesrc location=1920x1080.h264 ! h264parse ! vah264dec ! video/x-raw\(memory:VAMemory\) ! msdkh264enc ! fakesink
work well. And the encode result is correct.
I find that we may really need to move the va display logic into a lib to make the code clean.
What I do here is:
- Move the VA display logic into the va lib. We do not install its headers now because it is only used by va and MSDK inside -bad.
- Implement the MSDK's context using the common VA display. Then, all the va related plugins can share the same va display by GstContext.
- Import VA surface as VAMemory and MSDK can directly use it.
If the idea is correct, We may need to split this into two MR later, one for this lib, and another one for MSDK.
The exact same result. @vjaquez make the VAMemory the first choice of the caps. So once the down stream element reports VAMemory caps, it will be used.
- Resolved by He Junyan
Just a first impression glancing the patch: the
git mv
s make the history very confuse, it's hard to follow what was done.