nvcodec: Add CUDA specific memory and bufferpool
nvcodec: Peer direct access support
If support direct access each other, use device to device memory copy
without staging host memory
cudacontext: Enable direct CUDA memory access over multiple GPUs
If each device context can access each other, enable peer access
for better interoperability.
nvenc: Support CUDA buffer pool
When upstream support CUDA memory (only nvdec for now), we will create
CUDA buffer pool.
nvdec: Support CUDA buffer pool
If downstream can accept CUDA memory caps feature (currently nvenc only),
always CUDA memory is preferred.
nvcodec: Add CUDA specific memory and bufferpool
Introducing CUDA buffer pool with generic CUDA memory support.
Likewise GL memory, any elements which are able to access CUDA device
memory directly can map this CUDA memory without upload/download
overhead via the "GST_MAP_CUDA" map flag.
Also usual GstMemory access also possible with internal staging memory.
For staging, CUDA Host allocated memory is used (see CuMemAllocHost API).
The memory is allowing system access but has lower overhead
during GPU upload/download than normal system memory.
Merge request reports
Activity
We are not (I'm not?) far off from achieving the target perf.
UPDATE: this benchmark is not valid since !614 (merged).
1. nvdec -> download memory to system -> upload memory to CUDA -> nvenc
Only half of encoder resource are used. So there must be memory upload/download overhead.
2. nvdec -> to gl memory -> upload memory to CUDA -> nvenc
faster than system memory.
3. nvdec -> CUDA memory -> nvenc
Obviously faster than gl/system memory. The encoder resource is fully used!
Edited by Seungha Yang!494 (merged) is the last dependent MR
mentioned in merge request !494 (merged)
added 58 commits
-
7b1beba3...eab564d8 - 44 commits from branch
gstreamer:master
- 1b8f61b3 - nvenc: Add property for AUD insertion
- 8c3275c8 - nvenc: Add support for weighted prediction option
- d465f03c - nvenc: Add more rate-control options
- 263b6eda - nvenc: Remove pointless iteration and cleanup some code
- cf6309ec - nvenc: Refactoring internal buffer pool structure
- b884283b - nvenc: Add properties to support bframe encoding if device supports it
- 78d8ff4e - nvenc: Add qp-{min,max,const}-{i,p,b} properties
- 4a526843 - nvenc: Adjust DTS when bframe is enabled
- 76ee5f5b - nvcodec: Add CUDA specific memory and bufferpool
- 59fe4d18 - nvdec: Always response QUERY_CONTEXT even if openGL is unavailable on the system
- f452389a - nvdec: Support CUDA buffer pool
- f27df005 - nvenc: Support CUDA buffer pool
- 39f23d13 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- 6d497195 - nvcodec: Peer direct access support
Toggle commit list-
7b1beba3...eab564d8 - 44 commits from branch
added 92 commits
-
6d497195...fa83f086 - 79 commits from branch
gstreamer:master
- 83bbc262 - nvenc: Add property for AUD insertion
- 137eefb2 - nvenc: Add support for weighted prediction option
- b8a40c16 - nvenc: Add more rate-control options
- 1d1c8d85 - nvenc: Remove pointless iteration and cleanup some code
- ebb1bdc4 - nvenc: Refactoring internal buffer pool structure
- 147b047d - nvenc: Add properties to support bframe encoding if device supports it
- 8df0d94a - nvenc: Add qp-{min,max,const}-{i,p,b} properties
- 58f3af3c - nvenc: Adjust DTS when bframe is enabled
- 31e02f2c - nvcodec: Add CUDA specific memory and bufferpool
- 11a971c1 - nvdec: Support CUDA buffer pool
- 250ca0cc - nvenc: Support CUDA buffer pool
- 257eb414 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- a2ff8a55 - nvcodec: Peer direct access support
Toggle commit list-
6d497195...fa83f086 - 79 commits from branch
added 5 commits
Toggle commit listadded 21 commits
-
0cd791d3...82e23a27 - 7 commits from branch
gstreamer:master
- 44057b12 - nvenc: Refactor class hierarchy to handle device capability dependent options
- 71ed76c7 - nvenc: Add property for AUD insertion
- 022603c3 - nvenc: Add support for weighted prediction option
- 69091a35 - nvenc: Add more rate-control options
- c23fe4ca - nvenc: Remove pointless iteration and cleanup some code
- d723581b - nvenc: Refactoring internal buffer pool structure
- d049a263 - nvenc: Add properties to support bframe encoding if device supports it
- 7d1df83d - nvenc: Add qp-{min,max,const}-{i,p,b} properties
- 9ef60bd6 - nvenc: Adjust DTS when bframe is enabled
- 83f77df3 - nvcodec: Add CUDA specific memory and bufferpool
- 52778163 - nvdec: Support CUDA buffer pool
- 333da793 - nvenc: Support CUDA buffer pool
- fec78bf7 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- c210d24d - nvcodec: Peer direct access support
Toggle commit list-
0cd791d3...82e23a27 - 7 commits from branch
added 25 commits
-
c210d24d...1cbb23cf - 20 commits from branch
gstreamer:master
- 112772b7 - nvcodec: Add CUDA specific memory and bufferpool
- a830643b - nvdec: Support CUDA buffer pool
- 54204139 - nvenc: Support CUDA buffer pool
- e38fdcc2 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- 12f9600a - nvcodec: Peer direct access support
Toggle commit list-
c210d24d...1cbb23cf - 20 commits from branch
This CUDA buffer pool can save GPU memory and GPU processing power when
nvdec
->nvenc
transcoding case. And a requirement for CUDA filters !526 (closed)added 36 commits
-
12f9600a...82e86573 - 31 commits from branch
gstreamer:master
- f1d035f0 - nvcodec: Add CUDA specific memory and bufferpool
- 2b0f80d7 - nvdec: Support CUDA buffer pool
- c9c7bee1 - nvenc: Support CUDA buffer pool
- 4d153282 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- 2e64ff47 - nvcodec: Peer direct access support
Toggle commit list-
12f9600a...82e86573 - 31 commits from branch
added 48 commits
-
2e64ff47...76654539 - 43 commits from branch
gstreamer:master
- 07b67789 - nvcodec: Add CUDA specific memory and bufferpool
- 8afc19f6 - nvdec: Support CUDA buffer pool
- cb93b1ff - nvenc: Support CUDA buffer pool
- f98cbdf6 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- a47fc16e - nvcodec: Peer direct access support
Toggle commit list-
2e64ff47...76654539 - 43 commits from branch
added 10 commits
-
a47fc16e...8684dffe - 5 commits from branch
gstreamer:master
- 20d4669a - nvcodec: Add CUDA specific memory and bufferpool
- e1a1866f - nvdec: Support CUDA buffer pool
- edc24582 - nvenc: Support CUDA buffer pool
- 463530ee - cudacontext: Enable direct CUDA memory access over multiple GPUs
- ab21007b - nvcodec: Peer direct access support
Toggle commit list-
a47fc16e...8684dffe - 5 commits from branch
added 20 commits
-
ab21007b...b7ee6dc4 - 15 commits from branch
gstreamer:master
- 5a0a8300 - nvcodec: Add CUDA specific memory and bufferpool
- 9961fd09 - nvdec: Support CUDA buffer pool
- 2cc2e82e - nvenc: Support CUDA buffer pool
- 98785871 - cudacontext: Enable direct CUDA memory access over multiple GPUs
- c1760893 - nvcodec: Peer direct access support
Toggle commit list-
ab21007b...b7ee6dc4 - 15 commits from branch
One comment, how do envision exposing different cuda memory types?
From the docs (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM under
cuMemHostAlloc
), I can see there are multiple options for the memory location and type supporting different operations.Currently I have no plane to expose CUDA Host memory and CUDA texture memory at all, and I'd like to expose only CUDA device memory via
GstCUDAMemory
. Actually CUDA Host memory is used for staging CUDA memory (e.g., for read/write map, CUDA device memory would copied from/to staging CUDA host memory). So the users do not need to know about the CUDA host memory.
- Resolved by Seungha Yang
added 22 commits
-
c1760893...ef16d755 - 17 commits from branch
gstreamer:master
- 10c86454 - nvcodec: Add CUDA specific memory and bufferpool
- 9ba2860f - nvdec: Support CUDA buffer pool
- c1524292 - nvenc: Support CUDA buffer pool
- e3aac74b - cudacontext: Enable direct CUDA memory access over multiple GPUs
- 2069f1ab - nvcodec: Peer direct access support
Toggle commit list-
c1760893...ef16d755 - 17 commits from branch