nvcodec: Add CUDA specific memory and bufferpool

We are not (I'm not?) far off from achieving the target perf.

UPDATE: this benchmark is not valid since !614 (merged).

1. nvdec -> download memory to system -> upload memory to CUDA -> nvenc

Only half of encoder resource are used. So there must be memory upload/download overhead.

2. nvdec -> to gl memory -> upload memory to CUDA -> nvenc

faster than system memory.

3. nvdec -> CUDA memory -> nvenc

Obviously faster than gl/system memory. The encoder resource is fully used!

!494 (merged) is the last dependent MR

mentioned in merge request !494 (merged)

added 58 commits

7b1beba3...eab564d8 - 44 commits from branch gstreamer:master
1b8f61b3 - nvenc: Add property for AUD insertion
8c3275c8 - nvenc: Add support for weighted prediction option
d465f03c - nvenc: Add more rate-control options
263b6eda - nvenc: Remove pointless iteration and cleanup some code
cf6309ec - nvenc: Refactoring internal buffer pool structure
b884283b - nvenc: Add properties to support bframe encoding if device supports it
78d8ff4e - nvenc: Add qp-{min,max,const}-{i,p,b} properties
4a526843 - nvenc: Adjust DTS when bframe is enabled
76ee5f5b - nvcodec: Add CUDA specific memory and bufferpool
59fe4d18 - nvdec: Always response QUERY_CONTEXT even if openGL is unavailable on the system
f452389a - nvdec: Support CUDA buffer pool
f27df005 - nvenc: Support CUDA buffer pool
39f23d13 - cudacontext: Enable direct CUDA memory access over multiple GPUs
6d497195 - nvcodec: Peer direct access support

Compare with previous version

added 92 commits

6d497195...fa83f086 - 79 commits from branch gstreamer:master
83bbc262 - nvenc: Add property for AUD insertion
137eefb2 - nvenc: Add support for weighted prediction option
b8a40c16 - nvenc: Add more rate-control options
1d1c8d85 - nvenc: Remove pointless iteration and cleanup some code
ebb1bdc4 - nvenc: Refactoring internal buffer pool structure
147b047d - nvenc: Add properties to support bframe encoding if device supports it
8df0d94a - nvenc: Add qp-{min,max,const}-{i,p,b} properties
58f3af3c - nvenc: Adjust DTS when bframe is enabled
31e02f2c - nvcodec: Add CUDA specific memory and bufferpool
11a971c1 - nvdec: Support CUDA buffer pool
250ca0cc - nvenc: Support CUDA buffer pool
257eb414 - cudacontext: Enable direct CUDA memory access over multiple GPUs
a2ff8a55 - nvcodec: Peer direct access support

Compare with previous version

added 5 commits

fee6a56f - nvcodec: Add CUDA specific memory and bufferpool
0aaf69a7 - nvdec: Support CUDA buffer pool
88341c51 - nvenc: Support CUDA buffer pool
691b95e8 - cudacontext: Enable direct CUDA memory access over multiple GPUs
0cd791d3 - nvcodec: Peer direct access support

Compare with previous version

added 21 commits

0cd791d3...82e23a27 - 7 commits from branch gstreamer:master
44057b12 - nvenc: Refactor class hierarchy to handle device capability dependent options
71ed76c7 - nvenc: Add property for AUD insertion
022603c3 - nvenc: Add support for weighted prediction option
69091a35 - nvenc: Add more rate-control options
c23fe4ca - nvenc: Remove pointless iteration and cleanup some code
d723581b - nvenc: Refactoring internal buffer pool structure
d049a263 - nvenc: Add properties to support bframe encoding if device supports it
7d1df83d - nvenc: Add qp-{min,max,const}-{i,p,b} properties
9ef60bd6 - nvenc: Adjust DTS when bframe is enabled
83f77df3 - nvcodec: Add CUDA specific memory and bufferpool
52778163 - nvdec: Support CUDA buffer pool
333da793 - nvenc: Support CUDA buffer pool
fec78bf7 - cudacontext: Enable direct CUDA memory access over multiple GPUs
c210d24d - nvcodec: Peer direct access support

Compare with previous version

added 25 commits

c210d24d...1cbb23cf - 20 commits from branch gstreamer:master
112772b7 - nvcodec: Add CUDA specific memory and bufferpool
a830643b - nvdec: Support CUDA buffer pool
54204139 - nvenc: Support CUDA buffer pool
e38fdcc2 - cudacontext: Enable direct CUDA memory access over multiple GPUs
12f9600a - nvcodec: Peer direct access support

Compare with previous version

unmarked as a Work In Progress

changed the description

This CUDA buffer pool can save GPU memory and GPU processing power when nvdec -> nvenc transcoding case. And a requirement for CUDA filters !526 (closed)

added 36 commits

12f9600a...82e86573 - 31 commits from branch gstreamer:master
f1d035f0 - nvcodec: Add CUDA specific memory and bufferpool
2b0f80d7 - nvdec: Support CUDA buffer pool
c9c7bee1 - nvenc: Support CUDA buffer pool
4d153282 - cudacontext: Enable direct CUDA memory access over multiple GPUs
2e64ff47 - nvcodec: Peer direct access support

Compare with previous version

Are there somebody how have interest in this new feature? Especially I hope listen to opinion regarding new capsfeature memory:CUDAMemory

added 48 commits

2e64ff47...76654539 - 43 commits from branch gstreamer:master
07b67789 - nvcodec: Add CUDA specific memory and bufferpool
8afc19f6 - nvdec: Support CUDA buffer pool
cb93b1ff - nvenc: Support CUDA buffer pool
f98cbdf6 - cudacontext: Enable direct CUDA memory access over multiple GPUs
a47fc16e - nvcodec: Peer direct access support

Compare with previous version

added 10 commits

a47fc16e...8684dffe - 5 commits from branch gstreamer:master
20d4669a - nvcodec: Add CUDA specific memory and bufferpool
e1a1866f - nvdec: Support CUDA buffer pool
edc24582 - nvenc: Support CUDA buffer pool
463530ee - cudacontext: Enable direct CUDA memory access over multiple GPUs
ab21007b - nvcodec: Peer direct access support

Compare with previous version

added 20 commits

ab21007b...b7ee6dc4 - 15 commits from branch gstreamer:master
5a0a8300 - nvcodec: Add CUDA specific memory and bufferpool
9961fd09 - nvdec: Support CUDA buffer pool
2cc2e82e - nvenc: Support CUDA buffer pool
98785871 - cudacontext: Enable direct CUDA memory access over multiple GPUs
c1760893 - nvcodec: Peer direct access support

Compare with previous version

One comment, how do envision exposing different cuda memory types?

From the docs (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM under cuMemHostAlloc), I can see there are multiple options for the memory location and type supporting different operations.

Currently I have no plane to expose CUDA Host memory and CUDA texture memory at all, and I'd like to expose only CUDA device memory via GstCUDAMemory. Actually CUDA Host memory is used for staging CUDA memory (e.g., for read/write map, CUDA device memory would copied from/to staging CUDA host memory). So the users do not need to know about the CUDA host memory.

added 22 commits

c1760893...ef16d755 - 17 commits from branch gstreamer:master
10c86454 - nvcodec: Add CUDA specific memory and bufferpool
9ba2860f - nvdec: Support CUDA buffer pool
c1524292 - nvenc: Support CUDA buffer pool
e3aac74b - cudacontext: Enable direct CUDA memory access over multiple GPUs
2069f1ab - nvcodec: Peer direct access support

Compare with previous version

resolved all threads

Admin message

Admin message

nvcodec: Add CUDA specific memory and bufferpool

Merge request reports

Activity

1. nvdec -> download memory to system -> upload memory to CUDA -> nvenc

2. nvdec -> to gl memory -> upload memory to CUDA -> nvenc

3. nvdec -> CUDA memory -> nvenc