Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
999bc17a2471df17a3af3001d094cf6d5d4849b0 drm-tip: 2020y-06m-13d-09h-30m-45s UTC integration manifest
593c112156feb0f6159814f2276a32c90f243823 drm-tip: 2020y-06m-15d-12h-41m-08s UTC integration manifest
(I can reproduce the drop just by booting between these kernel builds.)
Kernel performance dropped on BXT J4205 both in Media and 3D GPU tests:
8-14% in most of the transcode tests (both single and multi-stream tests)
(Only GPU test that improves is QSV HEVC downscale + discard test, but even that test improves only with FFmpeg/QSV, not with MEdiaSDK/QSV. I think they do threading differently.)
Because of which tests are impacted, this issue seems to concern only tests that aren't fully GPU or CPU bound, but ones that are somewhere in between.
=> I assume the reason for the perf drop is kernel changing "powersave" scaling governor to "ondemand" one, because with ClearLinux (which uses "performance" governor), there was no change in any of the tests on BXT.
As I have only GEN9 HW for this, I don't know whether perf dropped also on Atoms with newer GEN. And I'm missing usable perf data for several other benchmarks and for GLK, because of this kernel bug: #1205 (closed)
EDIT: This is also visible on GLK, but it has huge variance, so impact is visible only on longer term perf trend. Of the Core machines, this is most visible on SKL GT4e, where the impact was up to 2-3%.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
please confirm if this issue is still seen on latest kernel also.
There has been no change in BXT media performance since the kernel power management perf regression.
As to 3D tests, there have been some Mesa perf improvements for some of the impacted tests, but (except for couple of them) not as large as this June kernel performance regression.
If yes, provide the steps to reproduce it for further debugging.
Steps and root cause are already given in the original bug report description.
In early January there was additional, smaller drop in transcode perf. At end of January there was larger increase than the few weeks earlier drop to transcode tests, but their perf dropped back to earlier level in late March.
In 3D benchmarks, impact was the other way round, see #2592 (closed).
=> As result, compared to original regressed kernel perf, currently perf is lower in Media tests, and about same in 3D tests.
All the regressed media transcode benchmarks are still in the regressed state on BXT, with 8-14% lower perf than before the regression.
(One transcode test-case that was NOT impacted, improved in mid November, but none of the impacted ones.)
Looking at the longer term trend, I can see that this was not specific to BXT, it's also visible on the other machines running with powersave/ondemand governor:
On GLK the variance is huge, but the range of variance changed by ~10%, so regression seems to be about same magnitude on that other Atom, and still there
On SKL GT4e machine which had the largest regression of the Core machines, the regression is still there, and about 0.5 - 2.5%
On GT2 core machines, the impact is very small and was fixed around March, similarly to 3D tests
Eero Tamminenchanged title from [BXT] Up tp 15% perf drop in Media and 3D to Up to 14% perf drop in Media transcode tests (on devices using powersave governor)
changed title from [BXT] Up tp 15% perf drop in Media and 3D to Up to 14% perf drop in Media transcode tests (on devices using powersave governor)
Dropping "kernel:5.8" label as there was no reply on its intent in the past 6 months (this bug is not specific to 5.8, it was just when it got introduced to drm-tip).
Regressed perf is still there with drm-tip 5.19-rc.
@eero-t Could you please elaborate the steps to reproduce this. Not familiar with the tests you are running, a step by step guide to reproduce this would help a lot.
Several different tools and different transcoding use-cases are needed because:
The smaller the resolution and less bits/pixel, the more transcode moves from GPU bound to CPU bound
FFmpeg and MFX/MediaSDK transcoding tool differ somewhat in how they do buffering and threading, so you get different results. Options given to both try to make them behave otherwise as close as possible
FFmpeg you can get e.g. from Ubuntu repositories (I'm using v5.1 built from sources, you can get that from 22.10, if the v4.4 in 20.04 is not new enough, which is e.g. case with discrete GPUs).
For older iGPUs like BXT, distro user-space drivers in latest distros are enough (for discrete GPUs, you would need more up to date user-space drivers for now).
While I'm building whole driver stack and MFX tools from sources, you can get "sample_multi_transcode" program from "libmfx-tools" package using Intel's public GPU driver package repository, see: https://dgpu-docs.intel.com/installation-guides/index.html
Date files
Input videos are based on raw video data files from Xiph.org. E.g. the Netflix one is created with two-pass conversion from a data file here: https://media.xiph.org/video/derf/ElFuente/
Hi @eero-t , first of all, sorry for no response to your report for almost 3 years. Apparently, someone on our side must have lost its visibility.
I can try to find a solution but for that I need your help. I've compiled the two drm-tip kernel versions you reported, and run ffmpeg transcoding on both. TBH, I can see no difference, neither on stdout nor in execution time measured with time utility.
How did you measure performance to be compared? What data should I look at to see the difference?
@jkrzyszt You really need Atom (J4205 BXT) to see the issue. Drop was too small on Core machines to notice it among normal variance (unless you have perf timeline from continuous testing [1]).
It was not visible with Performance governor, only with Powersave one (changed to Ondemand), i.e. most likely power management related.
This was the method established for getting most reliable media perf:
Running same Media command concurrently in multiple processes
Calculating resulting FPS as (sum of frames processed by the processes / elapsed total time)
Doing this several times, and taking median [1] of the FPS results
If one runs just single transcoding process, variance often hides the performance changes (except in daily test result trends). With a single process, drop discussed here is most visible on J4205 BXT in the "FullHD 20MB/s MPEG2 -> 6MB/s high compression H.264 transcode" test-case:
Just take the average (last) FPS reported by FFmpeg, after it finishes processing the specified number (2400) of frames.
[1] Perf results for GPU tests do not follow normal Gaussian distribution. Often result distribution has 2 clear humps, so average over increasing number of repeats does not converge well, and can be otherwise misleading. Best seems to be graph of daily build & perf tests median values (median shows changes (much) more clearly in timeline graphs than average).