Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
The memory clock is always at 100% at all circumstances, even while idle which cause the laptop to heat up and reduce the battery life and autonomy greatly.
Hardware description:
CPU: Quad Core AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx (-MT MCP-)
GPU: AMD Radeon Vega 10 Graphics
System Memory: DIMM
Display(s): 1920x1080@60
Type of Diplay Connection: Integrated
System information:
Distro name and Version: Archlinux
Kernel version: 5.10.10-arch1-1
AMD package version: xf86-video-amdgpu 19.1.0-2
How to reproduce the issue:
power up the laptop T495 on Archlinux with kernel version 5.10.10-arch1-1
launch a graphical session
run the tool radeontop
notice Memory Clock is always at 100%, making the laptop heat up more than usual and consume the battery a lot faster
Note that the memory clock on APUs is the system memory clock since the memory is used by both the CPU and the GPU, so depending on the system requirements for memory bandwidth the memory clock may have to run at high to meet the bandwidth requirements of the system (e.g., CPU, graphics, and display).
Hi Alex, the T495 runs hotter and has about half the battery life it had before the release of kernel 5.8 for me. What other factors could play into this besides the max memory clock symptom?
I'm on a Fedora Silverblue system which makes it very hard to bisect the kernel. If nobodys stepping in I'll install Arch later this week on a separate drive.
The issue is not in the kernel but in linux-firmware, see conversation below. Downgrading brings the mem clk down to minimum with no performance regressions - the mem clk smoothly scales as expected.
So, I had already forgotten about this, but I'm pretty sure this was caused by a firmware update as I commented here, more specifically this commit.
Reverting it (eg replacing /usr/lib/firmware/amdgpu/picasso_vcn.bin with the old version and rebuilding the initramfs) seems to solve the issue.
Are you sure you rebuilt the initramfs after changing firmware? (sudo dracut -f on regular Fedora, I have no clue if that works on Silverblue.) For me, the last good firmware package is linux-firmware-20201022-114.fc33, as 20201118 includes the new Picasso firmware.
You are 100% right! It is an issue in linux-firmware. Just tested using linux-firmware-20200421-107.fc33. The mem clock clocks down nicely to the minimum (400MhZ) and reported battery consumption drops by 2W on idle compared to 20201218-116.fc33.
EDIT: As you recommended. I'll test 20201022 tomorrow on the T495. This should help the devs narrow down the source of the issue.
Tested linux-firmware releases on an T495 (3500U picasso+raven) with Fedora 33. For a load scenario I've used basemark on the Vulkan High preset.
linux-firmware release
mem clk scaling
2020.04.21
works - clk is 400MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
2020.09.18
works - clk is 400MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
2020.10.22
works - clk is 400MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
2020.11.18
does not work - clk is 1.2GhZ at idle and load at all times
2020.12.18
does not work - clk is 1.2GhZ at idle and load at all times
Note: For the sake of readability I've introduced dots to the version numbers.
The reported power consumption using powertop is 2 watts lower on idle using 20201022 and before. The notebook is running considerably hotter on 20201118 and newer. I've not noted a performance difference.
So, this is probably a long, irrelevant side note, but:
I just tested again some with my machine (X395 with 3300U, Vega 6). Strangely, I get slightly different results:
In 2020.09.18, the clock scales, but mostly only between 400Mhz and 933Mhz (but 933 on idle, most of the time). I've seen it spike up once or twice, but mostly it stays on 933. (Subjectively, performance seems fine)
Already in 2020.10.22, the clock is locked to 1200. (Performance under load bad, probably due to thermal throttling)
As mentioned before, reverting the picasso_vcn.bin fixes the problem and the update happened in 2020.11.18, according to the changelog. But looking at the contents of the rpm's from Koji (linux-firmware-20200918-112.fc33 and linux-firmware-20201022-113.fc33) the update actually already happened between 2020.09.18 and 2020.10.22 – which is weird, because it was only committed in November.
I guess it's probably just something weird about the Fedora packaging, which does not happen in Silverblue. So even though for me, the issue appears in 2020.10.18 (Fedora), it actually comes from 2020.11.18 (upstream) as well.
The previous tests on my T495 were performed with default BIOS settings and the notebook unplugged. I've tested it with a plugged-in system in addition now. The behaviour is the same with higher min clocks. Tested on Fedora 33 with Gnome 3.38.3.
linux-firmware release
mem clk scaling battery
mem clk scaling plugged-in
2020.04.21
works - clk is 400MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
works - clk is 570MhZ to 700MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
2020.09.18
works - clk is 400MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
works - clk is 570MhZ to 700MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
2020.10.22
works - clk is 400MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
works - clk is 570MhZ to 700MhZ (min) at idle, clocks up under load until 1.2GhZ (max)
2020.11.18
does not work - clk is 1.2GhZ at idle and load at all times
does not work - clk is 1.2GhZ at idle and load at all times
2020.12.18
does not work - clk is 1.2GhZ at idle and load at all times
does not work - clk is 1.2GhZ at idle and load at all times
Note: I've seen a phenomenon during testing on 11.18 and newer where 1 in ~5 boots had the min clk at 933 MhZ, scaling to 1.2GhZ and dropping back to 933MhZ on idle.
t495s user here with same specs and hardware, i confirm that the issue exists also on t495s and reverting linux-firmware to any build preceding december solves the issue for me.
@noom yes it's safe because the firmware is code running on the device itself (the gpu in this case) which is usually independent of the operating system (the api provided by the firmware should be the same), also i tested on my t495s and no problems at all with downgrading (that is, until a fix is provided).
So I have tried to install each of the following versions of linux-firmware, then reboot and check radeontop and the latter reported a 100% memory clock for all of them:
@noom Are you sure you are rebuilding the initramfs after downgrading the linux-firmware package? Arch doesn't do that automatically, you might need to run "mkinitcpio -P" with root privileges.
You're right I completely forgot about this. I tried again and it seems the issue is gone using the version 20201023. I'm going to continue using this version until an actual fix is implemented.
It took us a while to get access to the appropriate thinkpad. We were only able to reproduce this on a thinkpad. Other systems seem to work correctly. Investigating further.
Hi, no all, Lenovo L340-15API Ryzen 3 3200U same bug.
5.15.6-2 with linux-firmware-20201023.dae4b4c-1 bug
5.15.6-2 with linux-firmware-20211027.1d00989-1 bug
5.15.6-2 with linuxfirmware-20200817.7a30af1-1 bug
5.15.6-2 with linux-firmware-20200817.7a30af1-1 bug
5.15.5-2 with linux-firmware-20211027.1d00989-1 bug
5.15.5-2 with linux-firmware-20201023.dae4b4c-1 bug
5.15.2 with linux-firmware-20201023.dae4b4c-1 no bug
5.15.2 with linux-firmware-20211027.1d00989-1 bug
I confirm, it's the same for me, with a T495, 3500u, arch, Kernel 5.11.6.
Memory Clock always at 100% not matter what, a lot of heat without reason.
With linux-firmware 2020.10.23 it is back to normal.
By the way many thanks @noom for the temp fix, I was getting crazy and didn't think about that.