Skip to content
Snippets Groups Projects

nvcodec: cuda buffers support

Merged Julian Bouzas requested to merge julian/gst-plugins-bad:nvcodec-cuda-buffers into master

Essentially, this MR rebases the old nvcodec cuda buffers MR (!526 (closed)) from @seungha.yang on top of master, and additionally fixes the issues I found after testing it for a while.

The performance has also been measured and it is about the same as ffmpeg's. It can be tested with the following transcoding pipeline:

gst-launch-1.0 filesrc location=test.mp4 ! qtdemux ! h264parse ! nvh264dec ! cudascale ! "video/x-raw(memory:CUDAMemory),width=1280,height=720" ! nvh264enc ! h264parse ! mp4mux ! filesink location=out.mp4

Which should take about the same time to finish as the following ffmpeg command:

ffmpeg -c:v h264_cuvid -resize 1280x720 -i test.mp4 -c:v h264_nvenc -y out.mp4
Edited by Seungha Yang

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Julian Bouzas added 14 commits

    added 14 commits

    • a5261a6f - cudacontext: Enable direct CUDA memory access over multiple GPUs
    • 3fdd10c8 - nvcodec: Peer direct access support
    • e075949a - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
    • 72564dba - nvcodec: Add support runtime CUDA kernel source compilation
    • bda6c613 - nvcodec: Add generic CUDA video convert object
    • 61274389 - nvcodec: Add CUDA video convert element
    • 632dc5f0 - nvcodec: Add CUDA video scale element
    • 4fdc4982 - tests: Add CUDA filter unit tests
    • 7053561e - nvcodec: Fix description of cudadownload element
    • 506a0f66 - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
    • 139a45ad - nvcodec: Add missing CUDAMemory src caps in h264 decoder
    • bd276c9b - nvcodec: Fix compiler error if OpenGL is not enabled
    • 7f2f2945 - nvcodec: Add max-display-delay decoder property
    • 0aa8246d - nvcodec: Report latency in decoder based on max-display-delay

    Compare with previous version

  • Julian Bouzas added 14 commits

    added 14 commits

    • 17f45d39 - cudacontext: Enable direct CUDA memory access over multiple GPUs
    • b3d39280 - nvcodec: Peer direct access support
    • 4121d2b2 - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
    • a5580cb7 - nvcodec: Add support runtime CUDA kernel source compilation
    • 3947fe6b - nvcodec: Add generic CUDA video convert object
    • 61461842 - nvcodec: Add CUDA video convert element
    • 0774f30f - nvcodec: Add CUDA video scale element
    • f4b3bd6b - tests: Add CUDA filter unit tests
    • 239ce563 - nvcodec: Fix description of cudadownload element
    • 5e35145e - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
    • 15683c17 - nvcodec: Add missing CUDAMemory src caps in h264 decoder
    • 34551b4d - nvcodec: Fix compiler error if OpenGL is not enabled
    • 3dfb4360 - nvcodec: Add max-display-delay decoder property
    • 9692cb63 - nvcodec: Report latency in decoder based on max-display-delay

    Compare with previous version

  • Seungha Yang changed title from nvcodec cuda buffers support to nvcodec: cuda buffers support

    changed title from nvcodec cuda buffers support to nvcodec: cuda buffers support

  • @julian Thanks for resurrecting my previous work. Apart from max-display-delay change, it looks good to me. I guess there would be trade-off between latency and performance depending on max-display-delay? Have you ever measured the difference?

  • @seungha.yang You are welcome.

    Yes, I have measured the performance of the above transcoding pipeline on a Nvidia Tesla M60 with a 8 minutes 1080p video and the improvement is notable (about 20% faster). If we set max-display-delay to 0, the transcoding pipeline takes about 35 seconds to finish. However, if we set max-display-delay to 4, the transcoding finishes in about 27 seconds only. Also, that value is hardcoded to 4 in ffmpeg: https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/cuviddec.c;h=2d6377bc8c6d9db54e78c7c032d12448a8a33ea0;hb=HEAD#l977

    Edited by Julian Bouzas
  • yes, adding some delay/queue/async-depth/thread into an encoding process would improve such performance, it's very expected result to me of course :smile: But in most case there's latency vs. performance trade-off. (For filesrc scenario, you might be able to notice such latency difference though)

    fwiw, I noticed queueing-size-like change would have a negative effect on performance for nvdec when I was working on !1211 (merged) so it makes sense that this change will improve performance.

    So, what I'm wondering is that whether increased max-display-delay will introduce latency or not.

    For example, if max-display-delay = 0 is outputting decoded picture immediately for IDR/key frame but max-display-delay = 4 is not the case, it would be very sensitive regression for some applications.

  • Perhaps another case where we could configure differently depending on live vs non-live situation. The property seems useful still, since some live scenarios can accept some latency, and may require the extra performance gain.

  • Julian Bouzas added 34 commits

    added 34 commits

    • 9692cb63...dcc4557d - 17 commits from branch gstreamer:master
    • 28a36613 - nvcodec: Add CUDA specific memory and bufferpool
    • 97c56e1f - nvdec: Support CUDA buffer pool
    • 85980a42 - nvenc: Support CUDA buffer pool
    • 2d0c4bc1 - cudacontext: Enable direct CUDA memory access over multiple GPUs
    • 6c7f4d9e - nvcodec: Peer direct access support
    • 3dff8c54 - nvcodec: Add CUDA upload/download elements with base class for CUDA filters
    • 6fac22f0 - nvcodec: Add support runtime CUDA kernel source compilation
    • 37b8ceed - nvcodec: Add generic CUDA video convert object
    • 4dcee706 - nvcodec: Add CUDA video convert element
    • fc65bb06 - nvcodec: Add CUDA video scale element
    • 3facb72f - tests: Add CUDA filter unit tests
    • db9d6078 - nvcodec: Fix description of cudadownload element
    • 09da7382 - nvcodec: Add missing CUDAMemory sink caps in h264 and h265 encoders
    • dae55efa - nvcodec: Add missing CUDAMemory src caps in h264 decoder
    • 25aa94e6 - nvcodec: Fix compiler error if OpenGL is not enabled
    • 147f125c - nvcodec: Add max-display-delay decoder property
    • 806ffba8 - nvcodec: Report latency in decoder based on max-display-delay

    Compare with previous version

  • Julian Bouzas resolved all threads

    resolved all threads

  • Julian Bouzas added 2 commits

    added 2 commits

    • e12a04e5 - nvcodec: Add max-display-delay decoder property
    • d2d76f6d - nvcodec: Report latency in decoder based on max-display-delay

    Compare with previous version

  • Julian Bouzas resolved all threads

    resolved all threads

  • I think this is good to go... so the people can test this?

  • I agree, and we're early in the 1.20 cycle in case anything goes wrong.

  • Olivier Crête mentioned in merge request !526 (closed)

    mentioned in merge request !526 (closed)

  • Seungha Yang mentioned in merge request !539 (closed)

    mentioned in merge request !539 (closed)

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading