nvcodec: cuda buffers support

added 14 commits

a5261a6f - cudacontext: Enable direct CUDA memory access over multiple GPUs
3fdd10c8 - nvcodec: Peer direct access support
e075949a - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
72564dba - nvcodec: Add support runtime CUDA kernel source compilation
bda6c613 - nvcodec: Add generic CUDA video convert object
61274389 - nvcodec: Add CUDA video convert element
632dc5f0 - nvcodec: Add CUDA video scale element
4fdc4982 - tests: Add CUDA filter unit tests
7053561e - nvcodec: Fix description of cudadownload element
506a0f66 - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
139a45ad - nvcodec: Add missing CUDAMemory src caps in h264 decoder
bd276c9b - nvcodec: Fix compiler error if OpenGL is not enabled
7f2f2945 - nvcodec: Add max-display-delay decoder property
0aa8246d - nvcodec: Report latency in decoder based on max-display-delay

added 14 commits

17f45d39 - cudacontext: Enable direct CUDA memory access over multiple GPUs
b3d39280 - nvcodec: Peer direct access support
4121d2b2 - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
a5580cb7 - nvcodec: Add support runtime CUDA kernel source compilation
3947fe6b - nvcodec: Add generic CUDA video convert object
61461842 - nvcodec: Add CUDA video convert element
0774f30f - nvcodec: Add CUDA video scale element
f4b3bd6b - tests: Add CUDA filter unit tests
239ce563 - nvcodec: Fix description of cudadownload element
5e35145e - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
15683c17 - nvcodec: Add missing CUDAMemory src caps in h264 decoder
34551b4d - nvcodec: Fix compiler error if OpenGL is not enabled
3dfb4360 - nvcodec: Add max-display-delay decoder property
9692cb63 - nvcodec: Report latency in decoder based on max-display-delay

changed title from nvcodec cuda buffers support to nvcodec: cuda buffers support

@julian Thanks for resurrecting my previous work. Apart from max-display-delay change, it looks good to me. I guess there would be trade-off between latency and performance depending on max-display-delay? Have you ever measured the difference?

@seungha.yang You are welcome.

Yes, I have measured the performance of the above transcoding pipeline on a Nvidia Tesla M60 with a 8 minutes 1080p video and the improvement is notable (about 20% faster). If we set max-display-delay to 0, the transcoding pipeline takes about 35 seconds to finish. However, if we set max-display-delay to 4, the transcoding finishes in about 27 seconds only. Also, that value is hardcoded to 4 in ffmpeg: https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/cuviddec.c;h=2d6377bc8c6d9db54e78c7c032d12448a8a33ea0;hb=HEAD#l977

yes, adding some delay/queue/async-depth/thread into an encoding process would improve such performance, it's very expected result to me of course But in most case there's latency vs. performance trade-off. (For filesrc scenario, you might be able to notice such latency difference though)

fwiw, I noticed queueing-size-like change would have a negative effect on performance for nvdec when I was working on !1211 (merged) so it makes sense that this change will improve performance.

So, what I'm wondering is that whether increased max-display-delay will introduce latency or not.

For example, if max-display-delay = 0 is outputting decoded picture immediately for IDR/key frame but max-display-delay = 4 is not the case, it would be very sensitive regression for some applications.

ok, I found a my previous experiment. It would introduce additional latency. See !834 (comment 290841)

Perhaps another case where we could configure differently depending on live vs non-live situation. The property seems useful still, since some live scenarios can accept some latency, and may require the extra performance gain.

added 34 commits

9692cb63...dcc4557d - 17 commits from branch gstreamer:master
28a36613 - nvcodec: Add CUDA specific memory and bufferpool
97c56e1f - nvdec: Support CUDA buffer pool
85980a42 - nvenc: Support CUDA buffer pool
2d0c4bc1 - cudacontext: Enable direct CUDA memory access over multiple GPUs
6c7f4d9e - nvcodec: Peer direct access support
3dff8c54 - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
6fac22f0 - nvcodec: Add support runtime CUDA kernel source compilation
37b8ceed - nvcodec: Add generic CUDA video convert object
4dcee706 - nvcodec: Add CUDA video convert element
fc65bb06 - nvcodec: Add CUDA video scale element
3facb72f - tests: Add CUDA filter unit tests
db9d6078 - nvcodec: Fix description of cudadownload element
09da7382 - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
dae55efa - nvcodec: Add missing CUDAMemory src caps in h264 decoder
25aa94e6 - nvcodec: Fix compiler error if OpenGL is not enabled
147f125c - nvcodec: Add max-display-delay decoder property
806ffba8 - nvcodec: Report latency in decoder based on max-display-delay