nvcodec: cuda buffers support
Essentially, this MR rebases the old nvcodec cuda buffers MR (!526 (closed)) from @seungha.yang on top of master, and additionally fixes the issues I found after testing it for a while.
The performance has also been measured and it is about the same as ffmpeg's. It can be tested with the following transcoding pipeline:
gst-launch-1.0 filesrc location=test.mp4 ! qtdemux ! h264parse ! nvh264dec ! cudascale ! "video/x-raw(memory:CUDAMemory),width=1280,height=720" ! nvh264enc ! h264parse ! mp4mux ! filesink location=out.mp4
Which should take about the same time to finish as the following ffmpeg command:
ffmpeg -c:v h264_cuvid -resize 1280x720 -i test.mp4 -c:v h264_nvenc -y out.mp4
Merge request reports
Activity
added 14 commits
- a5261a6f - cudacontext: Enable direct CUDA memory access over multiple GPUs
- 3fdd10c8 - nvcodec: Peer direct access support
- e075949a - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
- 72564dba - nvcodec: Add support runtime CUDA kernel source compilation
- bda6c613 - nvcodec: Add generic CUDA video convert object
- 61274389 - nvcodec: Add CUDA video convert element
- 632dc5f0 - nvcodec: Add CUDA video scale element
- 4fdc4982 - tests: Add CUDA filter unit tests
- 7053561e - nvcodec: Fix description of cudadownload element
- 506a0f66 - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
- 139a45ad - nvcodec: Add missing CUDAMemory src caps in h264 decoder
- bd276c9b - nvcodec: Fix compiler error if OpenGL is not enabled
- 7f2f2945 - nvcodec: Add max-display-delay decoder property
- 0aa8246d - nvcodec: Report latency in decoder based on max-display-delay
Toggle commit listadded 14 commits
- 17f45d39 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- b3d39280 - nvcodec: Peer direct access support
- 4121d2b2 - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
- a5580cb7 - nvcodec: Add support runtime CUDA kernel source compilation
- 3947fe6b - nvcodec: Add generic CUDA video convert object
- 61461842 - nvcodec: Add CUDA video convert element
- 0774f30f - nvcodec: Add CUDA video scale element
- f4b3bd6b - tests: Add CUDA filter unit tests
- 239ce563 - nvcodec: Fix description of cudadownload element
- 5e35145e - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
- 15683c17 - nvcodec: Add missing CUDAMemory src caps in h264 decoder
- 34551b4d - nvcodec: Fix compiler error if OpenGL is not enabled
- 3dfb4360 - nvcodec: Add max-display-delay decoder property
- 9692cb63 - nvcodec: Report latency in decoder based on max-display-delay
Toggle commit list@julian Thanks for resurrecting my previous work. Apart from
max-display-delay
change, it looks good to me. I guess there would be trade-off between latency and performance depending onmax-display-delay
? Have you ever measured the difference?@seungha.yang You are welcome.
Yes, I have measured the performance of the above transcoding pipeline on a Nvidia Tesla M60 with a 8 minutes 1080p video and the improvement is notable (about 20% faster). If we set
max-display-delay
to0
, the transcoding pipeline takes about35 seconds
to finish. However, if we setmax-display-delay
to4
, the transcoding finishes in about27 seconds
only. Also, that value is hardcoded to 4 in ffmpeg: https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/cuviddec.c;h=2d6377bc8c6d9db54e78c7c032d12448a8a33ea0;hb=HEAD#l977Edited by Julian Bouzasyes, adding some delay/queue/async-depth/thread into an encoding process would improve such performance, it's very expected result to me of course
But in most case there's latency vs. performance trade-off. (For filesrc scenario, you might be able to notice such latency difference though)fwiw, I noticed queueing-size-like change would have a negative effect on performance for nvdec when I was working on !1211 (merged) so it makes sense that this change will improve performance.
So, what I'm wondering is that whether increased
max-display-delay
will introduce latency or not.For example, if
max-display-delay = 0
is outputting decoded picture immediately for IDR/key frame butmax-display-delay = 4
is not the case, it would be very sensitive regression for some applications.- Resolved by Julian Bouzas
ok, I found a my previous experiment. It would introduce additional latency. See !834 (comment 290841)
added 34 commits
-
9692cb63...dcc4557d - 17 commits from branch
gstreamer:master
- 28a36613 - nvcodec: Add CUDA specific memory and bufferpool
- 97c56e1f - nvdec: Support CUDA buffer pool
- 85980a42 - nvenc: Support CUDA buffer pool
- 2d0c4bc1 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- 6c7f4d9e - nvcodec: Peer direct access support
- 3dff8c54 - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
- 6fac22f0 - nvcodec: Add support runtime CUDA kernel source compilation
- 37b8ceed - nvcodec: Add generic CUDA video convert object
- 4dcee706 - nvcodec: Add CUDA video convert element
- fc65bb06 - nvcodec: Add CUDA video scale element
- 3facb72f - tests: Add CUDA filter unit tests
- db9d6078 - nvcodec: Fix description of cudadownload element
- 09da7382 - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
- dae55efa - nvcodec: Add missing CUDAMemory src caps in h264 decoder
- 25aa94e6 - nvcodec: Fix compiler error if OpenGL is not enabled
- 147f125c - nvcodec: Add max-display-delay decoder property
- 806ffba8 - nvcodec: Report latency in decoder based on max-display-delay
Toggle commit list-
9692cb63...dcc4557d - 17 commits from branch
- Resolved by Julian Bouzas
assigned to @gstreamer-merge-bot
mentioned in merge request !526 (closed)
mentioned in merge request !539 (closed)
mentioned in commit julian/gst-plugins-bad@a8e9d616
mentioned in commit julian/gst-plugins-bad@b1ac8baf