GStreamer issueshttps://gitlab.freedesktop.org/groups/gstreamer/-/issues2024-03-24T01:12:03Zhttps://gitlab.freedesktop.org/gstreamer/cerbero/-/issues/473Speeding up the cerbero MSVC non-deps build2024-03-24T01:12:03ZNirbheek Chauhannirbheek.chauhan@gmail.comSpeeding up the cerbero MSVC non-deps buildLooking at https://gitlab.freedesktop.org/nirbheek/cerbero/-/jobs/56701480 which was a job that happened at a low CI utilization time, some things jumped out at me:
* 45s to `cp -a` cerbero-sources cache
* ??s to `du -sch cerbero-source...Looking at https://gitlab.freedesktop.org/nirbheek/cerbero/-/jobs/56701480 which was a job that happened at a low CI utilization time, some things jumped out at me:
* 45s to `cp -a` cerbero-sources cache
* ??s to `du -sch cerbero-sources` (1st time)
* 2 min to run `cargo vendor` for cargo-c (`cargo-c -> fetch`) out of ~3 min to run `fetch-bootstrap` (incl 45s to download rust toolchain)
* 3 min to run `cargo vendor` for gst-plugins-rs (`gst-plugins-rs -> fetch`) out of ~5 min to run `fetch-package`
* 30s to run `du -sch cerbero-sources` (2nd time)
* 1.5 min to unpack the deps cache (3s to download it)
* 45s to install the latest rust toolchain in `bootstrap`
* 1 min to re-install meson in `bootstrap` (!?)
* 6.5 min to build and install the gstreamer monorepo
* 10 min to build and install gst-plugins-rs (using cargo-c)
* 1 min to package the results to tar.xz
* **Total: 32 min**
Total job time: 32 min
The time needed to download and initialize the container is not included in this, but it is definitely quite slow.
Mitigations:
* `cargo vendor` actually runs *twice*
- The first time, it is run during `fetch` after extracting the git repo/tarball so we can fetch everything from the internet
- The second time, it is run during `extract` to ensure that everything is available in the source tree for an offline build
- This happens because fetch and extract are two separate steps. One fix is to add a mechanism that tells us to skip the extract phase if fetch completed and the recipe hasn't changed since fetch completed.
* `cargo vendor` and `gst-plugins-rs` builds taking so long are probably related: the windows VM runs on top of the current stable version of qemu-kvm with virtio, which is probably quite slow at I/O.
- QEMU 9.0 might fix this, but it hasn't been released yet: https://blog.vmsplice.net/2024/01/qemu-aiocontext-removal-and-how-it-was.html
- Another option is to passthrough a device into the VM so it bypasses virtio and run everything inside that
- This will also help with provisioning of the container, which is I/O bottlenecked
* Disabling Rust debug symbols and turning off optimizations might speed up gst-plugins-rs
- If the issue is I/O, doing a full-debug build without optimizations might slow things down because it'll increase the size of artifacts
* Investigate sccache for Rust on Windows
- This will be I/O-heavy to copy into/out of the container, so it will have to be mounted from the runner itselfhttps://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3291Rust: make debug builds instead of --release builds for monorepo subpipelines?2024-02-10T18:25:31ZTim-Philipp Müllertim@centricular.comRust: make debug builds instead of --release builds for monorepo subpipelines?Not 100% sure, but I was wondering if we're currently always creating `--release` builds, and if so if there's a way to create debug builds instead?
Might save some CI cycles for monorepo sub-pipelines.
(In fact we might not need to bu...Not 100% sure, but I was wondering if we're currently always creating `--release` builds, and if so if there's a way to create debug builds instead?
Might save some CI cycles for monorepo sub-pipelines.
(In fact we might not need to build plugins-rs at all for any monorepo change that doesn't affect public API, but I guess that's another discussion)https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3171Slow stuttery playback performance for 4K UHD VP9 and H.265 / HEVC samples wi...2024-03-19T05:31:13ZJeff Fortin TamSlow stuttery playback performance for 4K UHD VP9 and H.265 / HEVC samples with unaccelerated CPU decoding
Currently, every ffmpeg-only video player (such as Celluloid) puts GStreamer video players (such as Clapper, Totem, etc.) to shame in terms of being able to smoothly play 4K H.265 / HEVC videos on older CPUs (such as my 2x4 cores Intel ...
Currently, every ffmpeg-only video player (such as Celluloid) puts GStreamer video players (such as Clapper, Totem, etc.) to shame in terms of being able to smoothly play 4K H.265 / HEVC videos on older CPUs (such as my 2x4 cores Intel Xeon), even with playbin3. It happens on other machines as well (well, those that don't have hardware-accelerated H.265/VP9 decoding), not just @alatiera's favorite machine.
I am able to reproduce the issue on Fedora 39 with `gst-play-1.0 --use-playbin3 the_filename`.
## Samples
Use the 4K H.265 and VP9 24fps (and 60fps, if you want to be more taxing on the CPU) from:
* https://kodi.wiki/view/Samples (the ["Exodus" 4K 24fps sample](https://mega.nz/file/Sfw1hDpK#ErxCOpQDVjcI1gq6ZbX3vIfdtXZompkFe0jq47EhR2o) and ["The Redwoods" 4K VP9 24fps sample](https://mega.nz/#!pQEGgRwY!pD9whIlM-U9tJIA-LojxSt582BAZGfdSA5wAQLT06I4))
* https://elecard.com/videos (those are 60fps)
I have also mirrored the samples I used, along with an additional private 4K H.265 24fps anime sample, in [this folder](https://fortintam.com/public/gstreamer-uhd-samples/).
## Preliminary profiling
I did: `dnf debuginfo-install gstreamer1* ffmpeg ffmpeg-free libavcodec-free libavutil-free pulseaudio pipewire mingw64-gstreamer1-plugins-base-debuginfo mingw32-gstreamer1-plugins-base-debuginfo libde265 x265 dav1d rust-rav1e libvpx`
…and recorded these Sysprof 45 profiles on my desktop workstation running the Wayland version of GNOME 45.2:
* [H.265 4K 24fps anime sample Sysprof 45 profile](/uploads/475492a81993544524c8ad0831bea196/sysprof_45_profile_-_H.265_4K_fps_anime_sample.tar.xz)
* [VP9 4K 24fps "Redwoods" sample Sysprof 45 profile](/uploads/fa0b183b4c92e4d5f88cb8995d37cda9/sysprof_45_profile_-_VP9_4K_24fps_Redwoods_sample.tar.xz)
Below are convenience screenshots from what I see in Sysprof.
### H.265 (anime sample)
![Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_flame_graph_-_gst_part_only.opti](/uploads/3a501fbbe12b5481f3a099a956dc29e5/Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_flame_graph_-_gst_part_only.opti.png)
Also available: [screenshot of the full flame graph including non-GSt (ffmpeg) parts](/uploads/4b43f50f041bd1f0dbbfe7ae96fb8391/Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_flame_graph_-_everything.opti.png)
| Call graph | Graphics/compositor marks | CPU usage |
| -- | -- | -- |
| ![Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_call_graph_-_gst_part_only.opti](/uploads/c44a4a66d0b42b8e2f8f80b2a21dc39c/Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_call_graph_-_gst_part_only.opti.png) | ![Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_marks.opti](/uploads/3f9b17c4d8714f06216ffe9758eaacde/Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_marks.opti.png) | ![Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_CPUs_usage.opti](/uploads/fada020b475808ab2306a4f625c699b1/Screenshot_from_Sysprof_-_4K_24fps_HEVC_sample_-_CPUs_usage.opti.png) |
### VP9 ("The Redwoods" sample)
![Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_flame_graph_-_gst_only.opti](/uploads/58e08f23b6efb0408b768c19f65f12a9/Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_flame_graph_-_gst_only.opti.png)
Also available: [screenshot of the full flame graph including non-GSt (ffmpeg) parts](/uploads/386e79afdc910cc8135e0fbaf51c46bd/Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_flame_graph_-_everything.opti.png)
| Call graph | CPU usage |
| -- | -- |
| ![Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_call_graph_-_gst_part_only.opti](/uploads/d03eeb264b756de87aa8923db0958d9f/Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_call_graph_-_gst_part_only.opti.png) (also available: [screenshot of the non-GSt parts of the calls graph](/uploads/7a60c0a01fafcd53d97c16f2efcad6b8/Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_call_graph_-_ffmpeg_libvpx_part.opti.png)) | ![Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_CPUs_usage.opti](/uploads/5346e053e1c31fc8d85983e39479179f/Screenshot_from_Sysprof_-_4K_24fps_VP9_sample_-_CPUs_usage.opti.png)
The hypothesis so far is that raw video conversions shouldn't happen in software there.https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3074qml6glsink: fps decrease compared to qt5 qmlglsink2023-10-26T10:48:47ZDeymos sqml6glsink: fps decrease compared to qt5 qmlglsink### Describe your issue
<!-- a clear and concise summary of the bug. -->
<!-- For any GStreamer usage question, please contact the community using the #gstreamer channel on IRC https://www.oftc.net/ or the mailing list on https://gstream...### Describe your issue
<!-- a clear and concise summary of the bug. -->
<!-- For any GStreamer usage question, please contact the community using the #gstreamer channel on IRC https://www.oftc.net/ or the mailing list on https://gstreamer.freedesktop.org/lists/ -->
Significant decrease in FPS in qml6glsink relative to qmlglsink(qt5)
#### Expected Behavior
<!-- What did you expect to happen -->
rtsp is played with native fps, similar to what happens in vlc
#### Observed Behavior
<!-- What actually happened -->
FPS reduced by 20-40% relative to qt5 and FPS at source
#### Setup
- **Operating System:** Windows
- **Device:** Computer <!-- Delete as appropriate !-->
- **GStreamer Version:** 1.23.0
- **Command line:**
### Steps to reproduce the bug
<!-- please fill in exact steps which reproduce the bug on your system, for example: -->
1. Play low-quality stream with qml6glsink
### How reproducible is the bug?
<!The reproducibility of the bug is Always/Intermittent/Only once after doing a very specific set of steps-->
Always
### Screenshots if relevant
on left side qmlglsink, on right side qml6glsink, pipeline and code the same:
https://drive.google.com/file/d/1qdelYydk9ANxMX4I4ca4UzPTCu-UM_Oo/view?usp=sharing
### Additional Information
Video info:
Codec: H264 - MPEG-4 AVC (part 10)
Resolution: 960x576
framerate: 12
colors: ITU-R BT.709https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/2845jpegenc: Slow Encoding Issue at Resolution Width=4192, Height=31202023-07-24T06:30:21ZSulthan Amanujpegenc: Slow Encoding Issue at Resolution Width=4192, Height=3120Dear Team,
I hope this email finds you well. I am writing to seek your assistance in resolving a performance issue related to the Gstreamer pipeline with the Jpegenc element. I have been working on capturing frames from an imx8 board ca...Dear Team,
I hope this email finds you well. I am writing to seek your assistance in resolving a performance issue related to the Gstreamer pipeline with the Jpegenc element. I have been working on capturing frames from an imx8 board camera device and encoding them using Gstreamer. However, when I use the Jpegenc element, I am experiencing a significant decrease in the frame rate.
Below, I have provided three different pipeline commands and their corresponding frame rates:
**1. Without Jpegenc (Frame Rate: 8)**
gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw, width=4192,height=3120 ! videoconvert ! tee name=t ! queue ! fpsdisplaysink sync=false
=false4192,height=3120 ! videoconvert ! tee name=t ! queue ! fpsdisplaysink sync
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
^Chandling interrupt.
Interrupt: Stopping pipeline ...
Execution ended after 0:00:44.370671000
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
**Total showed frames (357), playing for (0:00:44.370925250), fps (8.046).**
**2. With Tee branch and without jpegenc (Frame Rate: 4.6)**
gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw, width=4192,height=3120 ! videoconvert ! tee name=t ! queue ! fpsdisplaysink sync=false t. ! queue ! fakesink
etting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
^Chandling interrupt.
Interrupt: Stopping pipeline ...
Execution ended after 0:00:33.713813875
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
**Total showed frames (156), playing for (0:00:33.714091500), fps (4.627).**
Freeing pipeline ...
**3. With Jpegenc (Frame Rate: 0.4)**
gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,width=4192,height=3120 ! videoconvert ! tee name=t ! queue ! fpsdisplaysink sync=false t. ! queue ! jpegenc ! fakesink
=false t. ! queue ! jpegenc ! fakesink! tee name=t ! queue ! fpsdisplaysink sync
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
^Chandling interrupt.
Interrupt: Stopping pipeline ...
Execution ended after 0:00:23.952335250
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
**Total showed frames (14), playing for (0:00:23.952648750), fps (0.584).**
The following elements are affected by the fakesink and fpsdisplaysink.
As you can see, the frame rate drops significantly when using Jpegenc. I have even tried adjusting the `idct-method` and `quality` properties, but the results are not as expected.
The details of my system are as follows:
- Gstreamer version: 1.14
- Linux version: Bionic
I am seeking your expertise to help identify the cause of this slowdown and suggest possible solutions to resolve the issue. Your assistance in this matter is highly appreciated.
Looking forward to your prompt response.
Thank you and best regards,
Sulthanhttps://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/2717pngenc: Copies way too much data2023-07-05T10:34:20ZEdward Herveypngenc: Copies way too much dataThe way `pngenc` handles data provided by the library is as follows:
* Create an "output" `GstBuffer`
* In the write callback (`user_write_data()`):
* Allocate a `GstMemory`
* Copy the provided data into that `GstMemory`
* Append ...The way `pngenc` handles data provided by the library is as follows:
* Create an "output" `GstBuffer`
* In the write callback (`user_write_data()`):
* Allocate a `GstMemory`
* Copy the provided data into that `GstMemory`
* Append that `GstMemory` to the output buffer
While the copy above can't be avoided, there are a lot more copies going on because ... of the limit of `GstMemory` that a `GstBuffer` can handle (16). And the png library will call the write callback even for very small memory (4-8 bytes).
So what happens in the last step is that every 16 calls, the existing `GstMemory` in the buffer will all be copied to a single `GstMemory` !
Issue can be clearly seen with `GST_DEBUG=2,*PERF*:9,*MEM*:8 gst-launch-1.0 videotestsrc num-buffers=1 ! video/x-raw,width=1024,height=2464 ! pngenc ! fakesink`https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/2712Scaletempo 2x playback causes high CPU loading2023-06-30T07:51:27ZrlandjScaletempo 2x playback causes high CPU loadingOn an Android phone that I use on a daily basis (running Android O),
I have found that when playing at 2x speed, the CPU loading of the
entire GStreamer pipeline is relatively high, sometimes reaching around 60% cpu loading.
I used sim...On an Android phone that I use on a daily basis (running Android O),
I have found that when playing at 2x speed, the CPU loading of the
entire GStreamer pipeline is relatively high, sometimes reaching around 60% cpu loading.
I used simpleperf to run a callgraph hot spot map and found that the bottleneck
is in the best_overlap_offset_s16() function in gstscaletempo.c,almost 98% CPU loading is consumed by this function. Therefore, I came here to seek help from the knowledgeable experts on this forum. Is anyone familiar with NEON programming? Can you help me optimize this function using the SIMP instruction in NEON? I have no prior experience with NEON, but I do have a mobile device handy that I can use to compare the performance before and after optimization.
```
Arch: arm64
Event: cpu-cycles:u (type 0, config 0)
Samples: 30624
Event count: 23246964527
Children Self Command Pid Tid Shared Object Symbol
34.42% 0.00% amcaudiodec-c 17305 17427 /apex/com.android.runtime/lib/bionic/libc.so __start_thread
|
-- __start_thread
|
-- __pthread_start(void*)
g_thread_proxy
g_thread_pool_thread_proxy
gst_task_func
|--0.04%-- [hit in function]
|
--99.96%-- gst_amc_audio_dec_loop
|--0.13%-- [hit in function]
|
|--95.13%-- gst_audio_decoder_finish_frame
| |--0.14%-- [hit in function]
| |
| |--99.21%-- gst_audio_decoder_output
| | |--0.03%-- [hit in function]
| | |
| | --99.97%-- gst_audio_decoder_push_forward
| | |
| | |--99.92%-- gst_pad_push
| | | gst_pad_push_data
| | | |--0.01%-- [hit in function]
| | | |
| | | |--99.93%-- gst_pad_chain_data_unchecked
| | | | |--0.08%-- [hit in function]
| | | | |
| | | | |--99.78%-- gst_proxy_pad_chain_default
| | | | | |--0.08%-- [hit in function]
| | | | | |
| | | | | |--99.87%-- gst_pad_push
| | | | | | |
| | | | | | |--99.95%-- gst_pad_push_data
| | | | | | | |--0.01%-- [hit in function]
| | | | | | | |
| | | | | | | |--99.95%-- gst_pad_chain_data_unchecked
| | | | | | | | |
| | | | | | | | |--99.96%-- gst_proxy_pad_chain_default
| | | | | | | | | gst_pad_push
| | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | |
| | | | | | | | | --99.98%-- gst_pad_push_data
| | | | | | | | | |
| | | | | | | | | |--99.89%-- gst_pad_chain_data_unchecked
| | | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | | |
| | | | | | | | | | |--99.95%-- gst_concat_sink_chain
| | | | | | | | | | | |
| | | | | | | | | | | |--99.95%-- gst_pad_push
| | | | | | | | | | | | gst_pad_push_data
| | | | | | | | | | | | |
| | | | | | | | | | | | |--99.97%-- gst_pad_chain_data_unchecked
| | | | | | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |--99.91%-- gst_proxy_pad_chain_default
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |--99.92%-- gst_pad_push
| | | | | | | | | | | | | | | gst_pad_push_data
| | | | | | | | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |--99.97%-- gst_pad_chain_data_unchecked
| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | |--99.93%-- gst_tee_chain
| | | | | | | | | | | | | | | | | |--0.05%-- [hit in function]
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | --99.95%-- gst_tee_handle_data
| | | | | | | | | | | | | | | | | |--0.06%-- [hit in function]
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | --99.94%-- gst_pad_push
| | | | | | | | | | | | | | | | | gst_pad_push_data
| | | | | | | | | | | | | | | | | |--0.01%-- [hit in function]
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |--99.93%-- gst_pad_chain_data_unchecked
| | | | | | | | | | | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | |--99.93%-- gst_stream_synchronizer_sink_chain
| | | | | | | | | | | | | | | | | | | |--0.05%-- [hit in function]
| | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | |--99.88%-- gst_pad_push
| | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | |--99.98%-- gst_pad_push_data
| | | | | | | | | | | | | | | | | | | | | |--0.03%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | |--99.93%-- gst_pad_chain_data_unchecked
| | | | | | | | | | | | | | | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | |--99.88%-- gst_proxy_pad_chain_default
| | | | | | | | | | | | | | | | | | | | | | | gst_pad_push
| | | | | | | | | | | | | | | | | | | | | | | gst_pad_push_data
| | | | | | | | | | | | | | | | | | | | | | | |--0.03%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | |--99.90%-- gst_pad_chain_data_unchecked
| | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | |--99.95%-- gst_base_transform_chain
| | | | | | | | | | | | | | | | | | | | | | | | | |--0.04%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | |--99.78%-- gst_pad_push
| | | | | | | | | | | | | | | | | | | | | | | | | | gst_pad_push_data
| | | | | | | | | | | | | | | | | | | | | | | | | | |--0.01%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | --99.99%-- gst_pad_chain_data_unchecked
| | | | | | | | | | | | | | | | | | | | | | | | | | |--0.02%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | |--99.95%-- gst_base_transform_chain
| | | | | | | | | | | | | | | | | | | | | | | | | | | |--0.01%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |--99.47%-- default_generate_output
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |--99.06%-- gst_scaletempo_transform
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--0.18%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--98.51%-- best_overlap_offset_s16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--0.59%-- fill_queue
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--6.43%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--57.72%-- __memcpy_base_a55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--22.51%-- gst_buffer_map
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gst_buffer_map_range
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--26.49%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--47.69%-- gst_memory_make_mapped
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gst_memory_map
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--48.71%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | --51.29%-- gst_mini_object_lock
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | --25.82%-- _get_merged_memory
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gst_mini_object_ref
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | --13.34%-- gst_buffer_unmap
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--40.69%-- [hit in function]
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |--40.50%-- gst_mini_object_unlock
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | --18.81%-- gst_memory_unmap
```
```
/* buffer padding for loop optimization: sizeof(gint32) * (loop_size - 1) */
#define UNROLL_PADDING (4*3)
static guint
best_overlap_offset_s16 (GstScaletempo * st)
{
gint32 *pw, *ppc;
gint16 *po, *search_start;
gint64 best_corr = G_MININT64;
guint best_off = 0;
guint off;
glong i;
pw = st->table_window;
po = st->buf_overlap;
po += st->samples_per_frame;
ppc = st->buf_pre_corr;
for (i = st->samples_per_frame; i < st->samples_overlap; i++) {
*ppc++ = (*pw++ * *po++) >> 15;
}
search_start = (gint16 *) st->buf_queue + st->samples_per_frame;
for (off = 0; off < st->frames_search; off++) {
gint64 corr = 0;
gint16 *ps = search_start;
ppc = st->buf_pre_corr;
ppc += st->samples_overlap - st->samples_per_frame;
ps += st->samples_overlap - st->samples_per_frame;
i = -((glong) st->samples_overlap - (glong) st->samples_per_frame);
do {
corr += ppc[i + 0] * ps[i + 0];
corr += ppc[i + 1] * ps[i + 1];
corr += ppc[i + 2] * ps[i + 2];
corr += ppc[i + 3] * ps[i + 3];
i += 4;
} while (i < 0);
if (corr > best_corr) {
best_corr = corr;
best_off = off;
}
search_start += st->samples_per_frame;
}
return best_off * st->bytes_per_frame;
}
```https://gitlab.freedesktop.org/gstreamer/cerbero/-/issues/433ci: build gst-plugins-rs in debug mode for git monorepo ci, for faster builds?2024-02-12T12:42:31ZTim-Philipp Müllertim@centricular.comci: build gst-plugins-rs in debug mode for git monorepo ci, for faster builds?Currently the gst-plugins-rs build takes a long time, especially on macOS, and is the bottleneck for any monorepo merge request.
I wonder if it would make sense to perhaps build gst-plugins-rs in debug mode instead of release mode for s...Currently the gst-plugins-rs build takes a long time, especially on macOS, and is the bottleneck for any monorepo merge request.
I wonder if it would make sense to perhaps build gst-plugins-rs in debug mode instead of release mode for such merge request pipelines?
Debug mode should be faster, but will generate larger artefacts.
Question is how much faster and how much larger are the artefacts - something to try perhaps.https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1646d3d11/GES: poor performance2022-12-09T20:43:36Zwatsonwelchd3d11/GES: poor performance### Describe your issue
GES does not appear to take advantage of hardware-accelerated d3d11 elements (such as d3d11h264dec and d3d11compositor) to create a zero-copy rendering pipeline. Playback of high-res/high-framerate video is very p...### Describe your issue
GES does not appear to take advantage of hardware-accelerated d3d11 elements (such as d3d11h264dec and d3d11compositor) to create a zero-copy rendering pipeline. Playback of high-res/high-framerate video is very poor.
#### Expected Behavior
I'd expect GES to use hardware-accelerated d3d11 elements to maximize performance.
#### Observed Behavior
GES appears to either not use the hardware-accelerated plugins, or CPU-copying (or otherwise degraded performance) is occurring somewhere in the pipeline.
#### Setup
- **Operating System:** Windows
- **GStreamer Version:** 1.21.3
- **Command line:**
### Steps to reproduce the bug
1. Download a high-res (e.g. 4K) and/or high-framerate (e.g. 60fps) video, such as [Netflix's "Sparks" open content](http://download.opencontent.netflix.com.s3.amazonaws.com/TechblogAssets/Sparks/encodes/Sparks_4096x2160_5994fps_SDR.mp4)
2. Open Windows PowerShell
3. Type `gst-launch-1.0 playbin uri=file:///path/to/Sparks_4096x2160_5994fps_SDR.mp4`
4. Observe that the video plays back normally (using hardware acceleration)
5. Type `$env:GST_PLUGIN_FEATURE_RANK="d3d11compositor:max"`
6. Type `$env:GST_PLUGIN_FEATURE_RANK="d3d11h264dec:max"`
7. Type `ges-launch-1.0 +clip Sparks_4096x2160_5994fps_SDR.mp4`
8. Observe that playback is extremely poor/choppy
### How reproducible is the bug?
Appears to be very reproducible with various 4K videos, on various devices.
### Related non-duplicate issues
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1117https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1535vah264dec + videoconvert is slow2022-11-15T13:35:00ZEric Knappvah264dec + videoconvert is slowConverting the output of vah264dec to anything with videoconvert is very slow. When resolution and framerate are high enough (e.g. 720p30), the conversion speed is less than realtime. Single core CPU usage hits 100%. The same pipeline wi...Converting the output of vah264dec to anything with videoconvert is very slow. When resolution and framerate are high enough (e.g. 720p30), the conversion speed is less than realtime. Single core CPU usage hits 100%. The same pipeline with vaapih264dec or vapostproc is much faster. I tried converting to multiple formats and they were all slow when vah264dec was paired with videoconvert. Logs do not show any issues.
#### Setup
- **Operating System:** Ubuntu 20.04.5
- **CPU:** i7-6700
- **GStreamer Version:** Main branch
SLOW:
`gst-launch-1.0 videotestsrc ! 'video/x-raw,width=1280,height=720,framerate=30/1' ! vah264enc ! queue ! h264parse ! vah264dec ! 'video/x-raw,format=NV12' ! videoconvert ! 'video/x-raw,format=UYVY' ! fakesink`
FAST (vaapih264dec):
`gst-launch-1.0 videotestsrc ! 'video/x-raw,width=1280,height=720,framerate=30/1' ! vah264enc ! queue ! h264parse ! vaapih264dec ! 'video/x-raw,format=NV12' ! videoconvert ! 'video/x-raw,format=UYVY' ! fakesink`
FAST (vapostproc):
`gst-launch-1.0 videotestsrc ! 'video/x-raw,width=1280,height=720,framerate=30/1' ! vah264enc ! queue ! h264parse ! vah264dec ! 'video/x-raw,format=NV12' ! vapostproc ! 'video/x-raw,format=UYVY' ! fakesink`https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1523avviddec: Constantly "renegotiating" with alternate frame stream2022-10-27T15:24:22ZEdward Herveyavviddec: Constantly "renegotiating" with alternate frame stream*Reproduceable with `validate.file.playback.fast_forward.cm5000_hevc_1080i_colorbars_ts`
The problem is that `update_video_context (ffmpegdec, context, picture)` returns TRUE (i.e. the format changed) even though it didn't, because it h...*Reproduceable with `validate.file.playback.fast_forward.cm5000_hevc_1080i_colorbars_ts`
The problem is that `update_video_context (ffmpegdec, context, picture)` returns TRUE (i.e. the format changed) even though it didn't, because it has no idea about `alternate` (single-field) interlacing style.
As a result it constantly renegotiates itself and sets latency.
cc @viviahttps://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1444wavparse: Runs all typefinders for all output2022-11-15T16:01:00ZEdward Herveywavparse: Runs all typefinders for all outputIntroduced by https://gitlab.freedesktop.org/gstreamer/gstreamer/-/commit/754f3a315ba37a523cbe114614cb32d666c02abe 12 years ago :smile:
In order to figure out if the "raw" audio contained within the wav container is actually DTS, `wavp...Introduced by https://gitlab.freedesktop.org/gstreamer/gstreamer/-/commit/754f3a315ba37a523cbe114614cb32d666c02abe 12 years ago :smile:
In order to figure out if the "raw" audio contained within the wav container is actually DTS, `wavparse` calls the typefinder helper ... except that means it runs *all* typefinders.
Since it only cares about checking for DTS, it should only run the `audio/x-dts` typefinder (if present).https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/issues/1734CPU consumption is reaching 100% with very one RTSP stream2022-11-16T10:53:42ZChandramouli PCPU consumption is reaching 100% with very one RTSP streamHello,
Good evening. Please find the below environment:
Operating System: Ubuntu 22.04 Server edition
GStreamer: 1.20.3
We are trying to render the RTSP stream that is coming from an IP camera on to a web browser using GStreamer and W...Hello,
Good evening. Please find the below environment:
Operating System: Ubuntu 22.04 Server edition
GStreamer: 1.20.3
We are trying to render the RTSP stream that is coming from an IP camera on to a web browser using GStreamer and WebRTCBin plugin. We followed the below URL and able to render the stream successfully:
**https://gitlab.freedesktop.org/gstreamer/gst-examples/-/tree/discontinued-for-monorepo/webrtc**
But, I noticed that CPU consumption is reaching to 100% with only one RTSP stream. Please find the enclosed screenshot for your reference.
![image](/uploads/bc284c9f28927b532afe9e2cdf0cbb6f/image.png)
Hence, request you to help us in resolving the load issue.
Thank you.
Best Regards,
Chandramouli.https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1390urisourcebin: Download-buffering trashes on fast network2022-11-10T09:21:15ZBastien Noceraurisourcebin: Download-buffering trashes on fast networkWhen download-buffering is enabled in playbin (through the `download` flag), accessing videos through HTTP will download the video as fast as possible, and write it to a local cache.
Unfortunately that means that it will try to download...When download-buffering is enabled in playbin (through the `download` flag), accessing videos through HTTP will download the video as fast as possible, and write it to a local cache.
Unfortunately that means that it will try to download hundreds of megs of data from local DLNA server, with the local cache being slow enough that this will cause performance problems until the download has finished. Ideally, `urisourcebin`/`downloadbuffer` would know how fast the disk and the network are so it can throttle its own download speed.
See https://gitlab.gnome.org/GNOME/totem/-/issues/496https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1191d3d11screencapturesrc: Very low framerate on Windows 112022-05-02T12:49:37ZDavide Perinid3d11screencapturesrc: Very low framerate on Windows 11Hi all,
I'm using `d3d11screencapturesrc` and previously `d3d11desktopdupsrc` without problems on my project.
**I noticed that the performance of the screen capture is completely broken now**,
I don't know if the problems occurred sinc...Hi all,
I'm using `d3d11screencapturesrc` and previously `d3d11desktopdupsrc` without problems on my project.
**I noticed that the performance of the screen capture is completely broken now**,
I don't know if the problems occurred since I upgraded to Windows 11 but I'm sure that I haven't this problem previously with Windows 10.
How to reproduce:
./gst-launch-1.0 d3d11screencapturesrc monitor-handle=221948 ! d3d11convert ! d3d11download ! autovideosink
While running a videogame (that is able to run at 50FPS on my RTX2080Ti) the GStreamer screen capture framerate dips down to 16FPS...
I'm sure that with my previous PC running Windows 10 the framerate never dipped under 60FPS even while gaming on heavy games running the same RTX2080Ti with the same game.
Is there something broken in the last GStreamer implementation along with Windows 11?
@seungha.yang I know that it's not very "polite" to quote devs but I know that you worked on d3d11screencapturesrc and I think that you are the only one able to help here :smile:https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/2860gl: Need to add modifier support for glupload and gldownload2024-02-12T23:29:11ZHe Junyangl: Need to add modifier support for glupload and gldownloadThe new platform export the DRM's modifier to users now. For example, the GPU surface with the X tiling have the modifier of I915_FORMAT_MOD_X_TILED on Intel platforms. We need to set this kind of modifiers correctly when we share the DM...The new platform export the DRM's modifier to users now. For example, the GPU surface with the X tiling have the modifier of I915_FORMAT_MOD_X_TILED on Intel platforms. We need to set this kind of modifiers correctly when we share the DMA surfaces between different module, such as VA->3D. On old platform, this modifier stored inside libdrm but now we need to explicitly specify it.
We already have some patch for VA part: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2032
and according to @ndufresne , all the video-format/modifier should have "format:modifier" in caps, like:
```
video/x-raw(memory:DMABuf)
width: [ 16, 16384 ]
height: [ 16, 16384 ]
format: { (string)NV12:0x0100000000000002, (string)I420, (string)YV12, (string)YUY2:0x0100000000000002, (string)P010_10LE:0x0100000000000002, (string)BGRA:0x0100000000000002, (string)RGBA:0x0100000000000002, (string)BGR10A2_LE:0x0100000000000002, (string)VUYA:0x0100000000000002 }
```
We also need to add this to gl's DMA part.https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/-/issues/931gl: Need to add modifier support for glupload and gldownload2023-07-27T11:30:05ZHe Junyangl: Need to add modifier support for glupload and gldownloadThe new platform export the DRM's modifier to users now. For example, the GPU surface with the X tiling have the modifier of I915_FORMAT_MOD_X_TILED on Intel platforms. We need to set this kind of modifiers correctly when we share the DM...The new platform export the DRM's modifier to users now. For example, the GPU surface with the X tiling have the modifier of I915_FORMAT_MOD_X_TILED on Intel platforms. We need to set this kind of modifiers correctly when we share the DMA surfaces between different module, such as VA->3D. On old platform, this modifier stored inside libdrm but now we need to explicitly specify it.
We already have some patch for VA part: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2032
and according to @ndufresne , all the video-format/modifier should have "format:modifier" in caps, like:
```
video/x-raw(memory:DMABuf)
width: [ 16, 16384 ]
height: [ 16, 16384 ]
format: { (string)NV12:0x0100000000000002, (string)I420, (string)YV12, (string)YUY2:0x0100000000000002, (string)P010_10LE:0x0100000000000002, (string)BGRA:0x0100000000000002, (string)RGBA:0x0100000000000002, (string)BGR10A2_LE:0x0100000000000002, (string)VUYA:0x0100000000000002 }
```
We also need to add this to gl's DMA part.https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/-/issues/903videoscale: Performance degradation from 1.16.2 -> 1.18.42021-07-22T12:57:20ZAlexey Belyakovvideoscale: Performance degradation from 1.16.2 -> 1.18.4> gst-launch-1.0 filesrc location=video-examples/person-bicycle-car-detection_1920_1080-2min.mp4 ! qtdemux ! avdec_h264 max-threads=1 ! videoscale n-threads=1 ! videoconvert n-threads=1 ! video/x-raw,format=BGRx,width=100,height=100 ! gv...> gst-launch-1.0 filesrc location=video-examples/person-bicycle-car-detection_1920_1080-2min.mp4 ! qtdemux ! avdec_h264 max-threads=1 ! videoscale n-threads=1 ! videoconvert n-threads=1 ! video/x-raw,format=BGRx,width=100,height=100 ! gvafpscounter ! fakesink async=false
Output with GStreamer **1.16.2**:
FPSCounter(average): total=251.41 fps, number-streams=1, per-stream=251.41 fps
Execution ended after 0:00:10.338461396
Output with GStreamer **1.18.4**:
FPSCounter(average): total=238.40 fps, number-streams=1, per-stream=238.40 fps
Execution ended after 0:00:10.8997466881.18.5https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/673High CPU usage in 1.18 (but not master) when pausing playback in gnome-music2021-09-17T15:35:56ZSebastian KellerHigh CPU usage in 1.18 (but not master) when pausing playback in gnome-musicI noticed that there is constant high CPU usage in gnome-music whenever playback is paused. I tried to analyze this with sysprof and basically the entire time is spent in futex syscalls from `gst_system_clock_async_thread`. This issue ca...I noticed that there is constant high CPU usage in gnome-music whenever playback is paused. I tried to analyze this with sysprof and basically the entire time is spent in futex syscalls from `gst_system_clock_async_thread`. This issue can not be reproduced using `gst-play-1.0 --use-playbin3`, so it probably depends on something that gnome-music is doing.
This issue seems to only affect the 1.18 branch, and can not be reproduced with the current master branch of gstreamer (+plugins-base +plugins-good). I can't find any obvious change in master that claims to fix such an issue though.
I tried to bisect the 1.18 branch and the issue started happening with b39a06065a2b7fd822a7a0a14812b1176f8a7023.
The relevant gnome-music code can be found here: https://gitlab.gnome.org/GNOME/gnome-music/-/blob/master/gnomemusic/gstplayer.py1.18.5https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/-/issues/820video-convert: Add fast paths from/to NV122021-09-24T13:26:07ZSebastian Drögevideo-convert: Add fast paths from/to NV12It's a quite common format nowadays, on the level of I420, and we should probably add a few more fast paths for it. Like one between I420 and NV12, but maybe also all the others we already have for I420.It's a quite common format nowadays, on the level of I420, and we should probably add a few more fast paths for it. Like one between I420 and NV12, but maybe also all the others we already have for I420.