Performance regression with Linux 5.7-rc1 on Iris Plus 655 and 4K screen (Bisected)

changed the description

added UX label

added priority::high severity::major labels

Yes, not running at maximum clocks all the time is the intent of that patch. And not forcing the clocks down was for the reason outlined above.

But. If you were GPU bound, you would be getting max clocks again. That strongly suggests that your system doesn't need the max clocks to sustain 60fps, just someone is submitting their frames too late [the compositing is not being completed before vblank]

After reading your response, I just tried disabling KWIN_USE_INTEL_SWAP_EVENT in Kwin wondering if that might have some effect on it. It does improve the performance of the window manager animations somewhat, though not to the level they were before the patch. It doesn't affect Firefox though; that is still very jerky. I also tried without the compositor at all, but the results there were meaningless because the lack of vsync makes everything jerky. Firefox isn't the only application affected either. Chromium also takes a hit, though not as badly as Firefox.

Could you take a moment to go back to basics, and say run a bare Xorg and glxgears [-fullscreen]. That definitely has no fancy delayed commit so should render early and not overrun the presentation time.

glxgears running fullscreen in an almost-bare xorg (only xterm was running otherwise) is completely smooth, but I'm not sure that really means anything because glxgears running fullscreen inside the compositor is also completely smooth.

Something else you can do is

echo 800 > /sys/class/drm/card0/gt_min_freq_mhz

and raise/lower it until you have a jitter free desktop. That tells what the absent RPS should be trying to achieve.

I just tried that. If I set it to 500, the jerkiness clears up when scrolling continuously, but I have to set it all the way to 1000 before it stops jerking in the first half-second or so after having been idle.

I also looked at intel_gpu_top while doing this and it seems to show the GPU speed hovering right around 800 while scrolling continuously. That doesn't seem to jive with having to set the minimum frequency to 1000 to avoid jerkiness on start, so I'm guessing intel_gpu_top is reporting an average and the GPU is actually clocking up and down many times a second?

The version of intel_gpu_top I'm looking at is using the perf interface, so it will be reporting the average value since the previous update. (Internally we are sampling the frequency every 5ms and accumulating a cycle counter that is then sampled by the user via perf. And RPS [gpu reclocking] evaluation intervals are set at 10-16ms.)

The problem is when the workload is less than an evaluation interval, we restart the evaluations and so never see an up or a down. For missing down, this means we stayed at max clocks and just wasted many a watt. For missing ups, we get jitter.

I think we have to bite the bullet and run with much shorter evaluation intervals so that the reclocking is responsive to short workloads. Unless there is some way for us to preserve history across the short workloads and assume that doing so makes sense.

OK, I'm happy to test any patches. :)

https://patchwork.freedesktop.org/series/75927/

rps is working (it's selecting mid ranges) but due to the nature of averaging, those tests are steady state workloads.

I'm not noticing any janks, which means nothing.

I've built 5.7-rc1 with your patches and the initial indication is that everything is smooth; perhaps even smoother than before 98479ada421a8fd2123b98efd398a6f1379307ab! Thanks!

I will keep testing and let you know if I discover anything else relevant.

Something to keep an eye would be power.

rapl.sh:

perf stat -a -x, -r 1 \
        -e "power/energy-pkg/" \
        -e "power/energy-cores/" \
        -e "power/energy-gpu/" \
        -e "i915/actual-frequency/" \
        -e "i915/rc6-residency/" \
        $*

If you sample "./rapl.sh sleep 30" a few times as you do your various tasks, that would be interesting. If you want to do before regression, at regression, now and with patch, that would be fantastic :)

Sadly whatever is easiest to measure is easiest to optimise for. Measuring latency is hard, detecting jank is harder. But it's the first thing people notice :(

rapl-5.7.0-rc1.txt

rapl-5.4.0-24.txt

rapl-5.7.0-rc1-withrevert.txt

rapl-5.7.0-rc1-withpatch.txt

I did it with unmodified 5.7.0-rc1, 5.7.0-rc1 with the bisected commit reverted, 5.7.0-rc1 with your patches, and 5.4.0-24 (the current stock kernel in Ubuntu 20.04) for good measure.

It seems that the patches caused a small improvement while idling but quite a large regression while scrolling. Both the revert and the patches were smooth but the unmodified 5.7.0-rc1 and the Ubuntu stock kernel were jerky.

Focusing on the bad scenario (scrolling),

rc1:

265.44,Joules,power/energy-pkg/,30000954498,100.00,,
48.38,Joules,power/energy-cores/,30000956801,100.00,,
94.78,Joules,power/energy-gpu/,30000958868,100.00,,
10999,M,i915/actual-frequency/,30000961517,100.00,,
3964578560,ns,i915/rc6-residency/,30000962176,100.00,,

revert:

419.81,Joules,power/energy-pkg/,29999952173,100.00,,
64.45,Joules,power/energy-cores/,29999958095,100.00,,
230.65,Joules,power/energy-gpu/,29999961058,100.00,,
16700,M,i915/actual-frequency/,29999965983,100.00,,
9842174007,ns,i915/rc6-residency/,29999968522,100.00,,

patch:

516.73,Joules,power/energy-pkg/,29999619061,100.00,,
69.89,Joules,power/energy-cores/,29999619425,100.00,,
322.48,Joules,power/energy-gpu/,29999619323,100.00,,
17795,M,i915/actual-frequency/,29999619662,100.00,,
10051509431,ns,i915/rc6-residency/,29999619565,100.00,,

We definitely see that we are underclocking and end up at 90% busy in rc1. In the smoother kernels, the frequency is about doubled and busyness is down to 66%, and that is even while doing more work (delivering more frames). But the patch is averaging a frequency bin higher, and that burns through an extra 6W for equivalent work (3W directly attributed to the GPU, and 3W through the system agent). [It may be that I'm misremembering pkg, and that it includes the GPU already, so just the 3W -- if I'm lucky).

That's disappointing, I was hoping it would actually end up downclocking in comparison to the longer incomplete EI. I would hazard a guess that it actually delivered more ff frames, but if the visual quality is indistinguishable between the two, that's actually wasted energy.

I also tested on 5.6.4, but the place I was getting the binary for that kernel doesn't have -tools available, so I couldn't run the perf test. The current draw at the wall seemed the same between 5.6.4 and 5.7.0-rc1 with the commit reverted.

Yeah, based on those numbers, the revert is my "favorite" of the options available right now and it is what I am going to run for now.

Also, for what it is worth, the idle power draw at the wall is indistinguishable between all the options.

New day, new series. Fixes a bug in v5.6 that caused delayed RPS interrupts, tweaked the EI and restored aggressive downclocking.

https://patchwork.freedesktop.org/series/75960/

With this patchset, the scrolling is smooth and the window animations appear smoother than with the first patchset, but there appears to be a further power regression as compared to the first patchset. :(

rapl-5.7.0-rc1-withpatch2.txt

I expect this to be of little impact for your workload, but you never know...

1ebf7aaf3ac0 ("drm/i915/gt: Prefer soft-rc6 over RPS DOWN_TIMEOUT")

Performance regression with Linux 5.7-rc1 on Iris Plus 655 and 4K screen (Bisected)

Child items ...

Activity

Admin message

Admin message

Performance regression with Linux 5.7-rc1 on Iris Plus 655 and 4K screen (Bisected)

Activity