Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Since upgrading my Debian sid from kernel 5.2 to at first 5.3 and now 5.4, the CPU (with integrated GPU) runs at tremendously increased temperatures, especially under the Cinnamon desktop environment (but also to a lesser extent on e.g. GNOME classic).
For the completely idle Cinnamon (i.e. not other major processes running and waiting 5-10 mins for cooling down) the increase is already some 10°C (for GNOME: +4°C) ... when playing back videos, even low res videos, it's a plus of 17°C (for GNOME: +4°C) and more.
Also, using VAAPI seems to perform generally worse than XV.
Simiarly, disabling intel_pstate seems to result in better temperatures than using it in active/HWP mode
Even when I just move the mouse pointer constantly in circles,... the CPU reaches 70°C.
Another issues, but that's possibly not a kernel issue:
Cinnamon, even just under 5.2, runs noticeably hotter when playing back videos than e.g. GNOME Shell does... (but as said that get's much worse with 5.4)... so I'd expect that Cinnamon does some kind of rendering/whatsoever which is not expected/intended by the graphics stack.
Any help on how to debug this would be greatly appreciated.
Thanks,
Chris.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
So I did some more systematic testing between the following:
kernel 5.2.17
kernel 5.4.6
each with
no intel_pstate parameter (which results in active/HWP)
intel_pstate=disable [0]
For each combination I've recorded:
average CPU usage for X.org and either cinnamon or gnome-shell
(running in classic mode) as shown by top
average temperature of the Package id 0 as shown by sensor
during the following scenarios:
idle system (i.e. respective desktop environment running with some
minimal set of applets (like clock, window list, or so), cron
disabled, no(!) further CPU intensive processes (e.g. firefox) were
running.
as well as:
playing a video in h264 (High) (avc1 / 0x31637661), yuv420p, 720x396
playing a video in h264 (High), yuv420p(tv, bt709, progressive),
1920x1080
with mpv, each in:
normal resolution (on a HiDPI display)
fullscreen
fullscreen using -vo=xv
In the two modes where -vo=xv wasn't given, mpv selected:
VO: [gpu] … vaapi[nv12]
For the testing, the notebook was placed on a metal surface (which
probably explains why the temperatures are a bit lower than what I've
reported in previous mails).
After each measurement (which often caused high CPU temperatures) I let
the CPU/system cool down for ~5 minutes until it reached the initially
measured "idle temperature" again.
Also for each measurement, I let the system in that state (e.g. being
idle or playing a video) for several minutes.
I've made the "average" values manually, for the temperatures those
should be quite accurate, for the CPU utilisation they should be
regarded more as a guide, since there were often spikes in on or the
other direction.
Of course, I took always the same 2 videos, and started them at the
beginning for each measurement.
Legend:
C = Cinnamon
G = Gnome Shell, classic mode
idl = idle (i.e. just desktop environment running, not interaction or
other intensive processes for several minutes)
loV = low-res video (h264 (High) (avc1 / 0x31637661), yuv420p,
720x396)
hiV = high-res video (h264 (High) yuv420p(tv, bt709, progressive),
1920x1080
no "fs" = no fullscreen
fs = fullscreen
no "xv" = mpv used [gpu] … vaapi[nv12]
xv = mpv used xv
CPU temperature / [Cinnamon|Gnome Shell CPU%] / X CPU%
In most cases, not using intel_pstate, results in the same or
noticeably lower temperature and CPU utilisation for both, Cinnamon
and Gnome.
The few ones where this isn't the case,... well for the CPU
utilisation I've already said that the values may not be that rock
solid... for temperature no idea...
Using xv instead of vaapi gives similar or better temperatures for
both, Cinnamon and GNOME
Both, Cinnamon and GNOME run at considerably higher temperature when
using 5.4 rather than 5.2,... for Cinnamon this is much more
noticeable (it still is for GNOME)...
E.g. 10°C more in absolute idle... and 15°C more when playing back
the low res video in full screen.
Maybe that's a reason why there haven't been much reports about this,
when e.g. GNOME is not that much affected.
For the HiRes video, one doesn't seem that much difference between
5.2 and 5.4, but probably because the CPU gets throttled down after
reaching 100°C the first time (which I could see especially under 5.4
in the kernel log).
Also with both 5.2/5.4 and with both Cinnamon/Gnome playing the high
res video on the HiDPI screen seems to be killing it (always >90°C).
Is this expected?
One can argue that Cinnamon might do a bit more in the background,
thus resulting in the generally somewhat higher temp/CPU-utilisation
even when idle... but since it's far more affected whenplaying back
videos than gnome (even under 5.2)... I'd guess there's also some
problem on the Cinnamon side.
But since it gets reproducibly worse from 5.2 to 5.4 (or 5.3),
there's also something changed in the kernel which strongly affects
it (and GNOME as well, just not that much)
...that is, at least on my hardware ;-)
Unfortunately all this is not limited to playing back videos...
on cinnamon, when I just
constantly move the mouse (like in circles) on an empty desktop, or
a window
or scroll up/down in e.g. Thunderbirds mail list
temperatures reach 69° C or more
having Firefox open, say around 10 windows, none of them playing any
video or animated GIF, none of them running any scripts (thanks to
noscript)... temperatures go to ~85°C on Cinnamon (not the case on
GNOME, at least not that extreme).
So my suspicion would be something is wrong at the graphics stack
and/or how it's used especially by Cinnamon.
Also (and I've tested the following only in Cinnamon), as previously
noticed, sometimes, but not always:
CPU temperature stays very high for several minutes, even though I've
already stopped e.g. video playback or mouse moving
maybe, but there's only little indication for this: putting the
system into suspend2ram and waking it up, seemed to have cured the
symptoms for "a while".... but this is very vague.
Any help or pointers on how to debug this further would be highly
appreciated... obviously running a notebook at ~80° or not being able
to upgrade the kernel is kind of a showstopper.
You haven't reported measuring rc6, but that may be the difference of around 2W (depending on GPU) for an idle load. If you use the rapl mentioned in the other bug, i.e. something like perf stat -a -x, -r 1 -e "power/energy-pkg/" -e "power/energy-gpu/" -e "i915/rc6-residency/" sleep 60, you should be able to determine if this is the cause. It is what changed in v5.3 that has identical symptoms, so probably the same.
Cinnamon/GNOME Shell are vastly different rendering models, it's swings and roundabouts as to which one performs better under various measurements -- neither is perfect. Might be worth a bug to see if we misbehave under one or the other.
xv vs vaapi, no idea -- file a separate bug, and we'll dig into it :)
As for bad performance overall, be prepared to chase upstream (e.g. mesa-20, drm-tip) and we'll investigate what is impacting your system. Please do file it as a separate bug though, it's likely a different issue to xv/vaapi.
So if I understand correctly, it should look like the other bug and my CPU never enters RC6 and thus pulls more Joules and runs higher..., right?
And I assume this would also explain my observation that in some cases it was "cured" for a while, when suspending/resuming the system?
What it does not explain is, why Cinnamon is so much more affected by this, unless of course it's in the other issue, which generally seem to cause Cinnamon run on much higher temperatures.
Yes. It does appear to be the case. The GPU never idled, and never entered its powersaving mode (RC6). That in turn prevents the whole GPU from entering runtime suspend, and prevents the CPU from entering deep package C-states. So one little bug in the GPU has quite dramatic impact on the overall powersaving landscape.
However, when sufficiently idle (such as being forced to idle on suspend), it would enter powersaving.
@ickle I've opened #955 (closed) and #956 ... should I also file one for the observation that the system runs hotter with intel_pstate than it does without (or where would such bug belong to?)?
Oh and as for #614 (closed)... is there any final fix on the horizon which still closes the security hole and which will be sent to the stable trees?
Is there any expectation on when these patches will end up in -stable? It seems to be no included in the most recent -stable releases and I couldn't find them on linux-stable mailing list either.
Could someone with the appropriate rights un-mark this as duplicate from #614 (closed) (otherwise I'd have to file a separate issue which just clutters up the issue tracker).
Apparently it's not a duplicate,.... just retried the whole thing with 5.5.13 (which should contain the alleged fix for #614 (closed)), and basically everything is still fully broken.
While the GPU seems to enter RC6 states now:
# perf stat -a -x, -r 1 -e "power/energy-pkg/" -e "power/energy-gpu/" -e "i915/rc6-residency/" sleep 60161,99,Joules,power/energy-pkg/,60001047882,100,00,,31,42,Joules,power/energy-gpu/,60001049257,100,00,,55883685200,ns,i915/rc6-residency/,60001059714,100,00,,
... temperatures are still generally higher (just take the values from my test series above and replace kernel 5.4 with 5.5)... and when playing videos they go just nuts (~100°C)... just moving around windows, makes the temps go to beyond 70°C.
So must be some other serious regression from >5.2.
Several people reported in #614 (closed) that the fix isn't really 100% and that even though RC6 is being reached most of the time, it's not as good as it used to be. So to completely rule out that these issues are duplicate, it'd be better to revert the security fix that broke it ("drm/i915/gen8+: Add RC6 CTX corruption WA") and test that.
I have no idea whether it applies or not, and I think I said that quite clearly. It's up to you to check and possibly adapt it.
I never said it's any kind of solution. I'm just saying that to completely rule out that your issue is the same as #614 (closed), you should test with a revert of the cause of #614 (closed) instead of its fix, which some people say is not perfect.
I have not completely read all your comments, I just skimmed through them, sorry.
5.2 being much hotter for non-GPU-related stuff can still be explained by #614 (closed), because #614 (closed) makes the CPU package hotter regardless of what you're doing (actually the less you're doing the more of a difference it makes, as it's mainly affecting idle power).
Also, you might be interested in this comment of mine: #614 (comment 372847)
It makes it possible to just rebuild a patched version of the i915 module, which is considerably faster than building an entire kernel, and spares you the trouble of configuring it and installing it.
Finally, let it be noted that I'm just an ordinary user, like you, frustrated by the bugs I encounter and doing my best to assist the developers to fix them. So I'm just dumping here what I know to help you diagnose the issue, but I really can't do more than that: give you a few hints that may or may not help you come closer to a solution.
# perf stat -a -x, -r 1 -e "power/energy-pkg/" -e "power/energy-gpu/" -e "i915/rc6-residency/" sleep 60114,46,Joules,power/energy-pkg/,59999706962,100,00,,5,03,Joules,power/energy-gpu/,59999716594,100,00,,59067457547,ns,i915/rc6-residency/,59999723535,100,00,,
playing HD video fullscreen (100°C):
# perf stat -a -x, -r 1 -e "power/energy-pkg/" -e "power/energy-gpu/" -e "i915/rc6-residency/" sleep 601022,46,Joules,power/energy-pkg/,59999897262,100,00,,679,88,Joules,power/energy-gpu/,59999895772,100,00,,17550080,ns,i915/rc6-residency/,59999903626,100,00,,
Comparing these with the 5.2 measurements from above (#953 (comment 381198)) there seems to be still a much higher energy usage.
Idle temperature on 5.5 is ~15°C higher than on 5.2.
Also, while it does go a bit into RC6 while playing back fullscreen video, it's much less than it did with 5.2.
When switching between workspaces I get temperatures between 70 and 80°C .. with video playback it goes generally to 100°C.
But I have no idea whether it's really anyhow related to my extreme CPU/GPU overheatin problem... possibly not as it's been the first time I've seen it, while the CPU/GPU plays being a little sun for months.
Christoph Anton Mittererchanged title from tremendous CPU/GPU temperature increase since kernel 5.3 to tremendous CPU/GPU temperature increase since kernel 5.3 up to (at least) 5.5
changed title from tremendous CPU/GPU temperature increase since kernel 5.3 to tremendous CPU/GPU temperature increase since kernel 5.3 up to (at least) 5.5