Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Performance regression with Linux 5.7-rc1 on Iris Plus 655 and 4K screen (Bisected)
Starting with Linux 5.7-rc1 on my NUC8i7bek with an Iris Plus 655, I noticed a rather severe performance regression where a number of things (notably scrolling in a maximized Firefox window and window minimize/maximize/restore animations) that used to be perfectly smooth 60FPS in 5.6 now drop a lot of frames and/or run at 30FPS. I only noticed this on the system with a 4K screen; I was not able to reproduce on a system with a 1080p screen. I bisected and found that https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/i915?id=98479ada421a8fd2123b98efd398a6f1379307ab introduces the issue.
System information (from the affected system):
GPU: Intel Corporation Iris Plus Graphics 655
uname -m: x86_64
uname -r: 5.7.0-050700rc1-lowlatency
Distro: Kubuntu 20.04
Machine: Intel NUC8i7bek
Display connector: HDMI 2.0 via integrated LSPCON
Nothing relevant shows in dmesg or any other log.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Yes, not running at maximum clocks all the time is the intent of that patch. And not forcing the clocks down was for the reason outlined above.
But. If you were GPU bound, you would be getting max clocks again. That strongly suggests that your system doesn't need the max clocks to sustain 60fps, just someone is submitting their frames too late [the compositing is not being completed before vblank]
After reading your response, I just tried disabling KWIN_USE_INTEL_SWAP_EVENT in Kwin wondering if that might have some effect on it. It does improve the performance of the window manager animations somewhat, though not to the level they were before the patch. It doesn't affect Firefox though; that is still very jerky. I also tried without the compositor at all, but the results there were meaningless because the lack of vsync makes everything jerky. Firefox isn't the only application affected either. Chromium also takes a hit, though not as badly as Firefox.
Could you take a moment to go back to basics, and say run a bare Xorg and glxgears [-fullscreen]. That definitely has no fancy delayed commit so should render early and not overrun the presentation time.
glxgears running fullscreen in an almost-bare xorg (only xterm was running otherwise) is completely smooth, but I'm not sure that really means anything because glxgears running fullscreen inside the compositor is also completely smooth.
I just tried that. If I set it to 500, the jerkiness clears up when scrolling continuously, but I have to set it all the way to 1000 before it stops jerking in the first half-second or so after having been idle.
I also looked at intel_gpu_top while doing this and it seems to show the GPU speed hovering right around 800 while scrolling continuously. That doesn't seem to jive with having to set the minimum frequency to 1000 to avoid jerkiness on start, so I'm guessing intel_gpu_top is reporting an average and the GPU is actually clocking up and down many times a second?
The version of intel_gpu_top I'm looking at is using the perf interface, so it will be reporting the average value since the previous update. (Internally we are sampling the frequency every 5ms and accumulating a cycle counter that is then sampled by the user via perf. And RPS [gpu reclocking] evaluation intervals are set at 10-16ms.)
The problem is when the workload is less than an evaluation interval, we restart the evaluations and so never see an up or a down. For missing down, this means we stayed at max clocks and just wasted many a watt. For missing ups, we get jitter.
I think we have to bite the bullet and run with much shorter evaluation intervals so that the reclocking is responsive to short workloads. Unless there is some way for us to preserve history across the short workloads and assume that doing so makes sense.
I've built 5.7-rc1 with your patches and the initial indication is that everything is smooth; perhaps even smoother than before 98479ada421a8fd2123b98efd398a6f1379307ab! Thanks!
I will keep testing and let you know if I discover anything else relevant.
If you sample "./rapl.sh sleep 30" a few times as you do your various tasks, that would be interesting. If you want to do before regression, at regression, now and with patch, that would be fantastic :)
Sadly whatever is easiest to measure is easiest to optimise for. Measuring latency is hard, detecting jank is harder. But it's the first thing people notice :(
I did it with unmodified 5.7.0-rc1, 5.7.0-rc1 with the bisected commit reverted, 5.7.0-rc1 with your patches, and 5.4.0-24 (the current stock kernel in Ubuntu 20.04) for good measure.
It seems that the patches caused a small improvement while idling but quite a large regression while scrolling. Both the revert and the patches were smooth but the unmodified 5.7.0-rc1 and the Ubuntu stock kernel were jerky.
We definitely see that we are underclocking and end up at 90% busy in rc1. In the smoother kernels, the frequency is about doubled and busyness is down to 66%, and that is even while doing more work (delivering more frames). But the patch is averaging a frequency bin higher, and that burns through an extra 6W for equivalent work (3W directly attributed to the GPU, and 3W through the system agent). [It may be that I'm misremembering pkg, and that it includes the GPU already, so just the 3W -- if I'm lucky).
That's disappointing, I was hoping it would actually end up downclocking in comparison to the longer incomplete EI. I would hazard a guess that it actually delivered more ff frames, but if the visual quality is indistinguishable between the two, that's actually wasted energy.
I also tested on 5.6.4, but the place I was getting the binary for that kernel doesn't have -tools available, so I couldn't run the perf test. The current draw at the wall seemed the same between 5.6.4 and 5.7.0-rc1 with the commit reverted.
With this patchset, the scrolling is smooth and the window animations appear smoother than with the first patchset, but there appears to be a further power regression as compared to the first patchset. :(