Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Hello,
with Polaris and Vega, setting amdgpu.ppfeaturemask=0xffffffff worked without issues here: It unlocked pp_od_clk_voltage and didn't cause any issues for me.
But with Navi, it doesn't work. I'm still not allowed to open
/sys/class/drm/card0/device/pp_od_clk_voltage
as root with specifying that flag.
Also, I can't increase the GPU's power consumption, as
/sys/class/drm/card0/device/hwmon/hwmon0/power1_cap_max
only allows the default 100% Powertune limit, meaning I can't set any higher value in
/sys/class/drm/card0/device/hwmon/hwmon0/power1_cap
Apart from not being able to change the aforementioned parameters, setting amdgpu.ppfeaturemask=0xffffffff causes stuttering, even on the desktop and also affects the mouse cursor.
This is with kernel drm-next-5.5-wip 73cdff347343504287feae8b36fa7317f04dcc61
and an MSI 5700 XT Gaming X.
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Still happens with current 5.5-wip/drm-next kernels.
I don't know if it is supposed to be implemented, but there seems to be some bug apart from that:
Just reading sysfs entries at "/sys/class/drm/card0/device/" makes the parsing program freeze, e.g. filebrowser (also if started as root).
As a workaround, use upp instead as a workaround (write to the powerplay binary directly). See: https://github.com/sibradzic/upp
I suggest using 5.4-rcX as AMD's wip kernels (amd-staging-drm-next and drm-next) may still have a bug with pptable writing. Or you can try reverting 3abf8d896f8ac72341677a6cd82662b80943f9c8
drm/amd/powerplay: do proper cleanups on hw_fini
Be aware that this method can cause issues with fan control, so you might also need to manually set the fans after that. You can use fanctl to handle this: https://gitlab.com/mcoffin/fanctl
I have the same (or at least a similar) bug. /sys/class/drm/card1/device/hwmon/hwmon3/power1_cap_max in my case gives the default 220W (value: 220000000).
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
returns nothing.
I don't get any stuttering though, with kernel 5.3.6 or with 5.4rc2.
Dolphin freezes when looking at /sys/class/drm/card1/device/ as well.
Thanks for the hint @ Andrew Sheldon, SPPT being possible on Linux totally passed me by. Will test it with my cheap Polaris card first, which made me stick with custom fan curve anyway.
Regarding the stutter with amdgpu.ppfeaturemask=0xffffffff: I'm not sure anymore if it really was related, as hardware cursor support seems to be still a complete mess for Navi with 5.3/5.4 and 5.5 still being incomplete.
I can also confirm the issue exists. Setting amdgpu.ppfeaturemask=0xffffffff doesn't allow me to access the "States Table" section in radeon-profile, as if the parameter was ignored.
As for the stutter issue, I don't know what exactly it is as I don't notice any difference with or without the parameter. On 5.3 kernel, the mouse feels sluggish as if my monitor is running at 30Hz, but it's fine on 5.4 (rc) kernel. This is observed on official Manjaro kernels.
Tested custom soft power play table via UPP on Polaris and it generally seems to work well (might be able to test Navi at a later time).
However, there is the issue that the voltage gets reset when there is a modeline switch. So I've written a script which checks the voltage and restarts UPP when it exceeds values which would not occur with my undervolting:
#!/bin/bash
while true; do
sleep 1
read -r num < /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon0/in0_input
if [[ "$num" -gt 1030 ]]; then
systemctl restart amdgpu-oc && systemctl restart amdgpu-fancontrol
fi
done
I'm already using ppfeaturemask=0xfffd7fff, it doesn't unlock anything - or at least CoreCtrl doesn't show anything.
In the journald log I see a lot of these lines, always grouped together:
08.11.19 20:20 kernel amdgpu: [powerplay] Failed to send message 0xe, response 0xfffffffb, param 0x80
08.11.19 20:20 kernel amdgpu: [powerplay] Failed to send message 0x20, response 0xfffffffb param 0x2
Can anyone share his/her experience with using custom power play tables via upp for Navi?
Now with that fix for Polaris by Alex, it seems to be absolutely flawless for me.
Would be good to know if the same applied to Navi.
It was just a bit inconvenient that for my Polaris card the Vdds were defined as garbage values when parsing the default pp_table. Though specifying custom values in mV worked without issues.
It was just a bit inconvenient that for my Polaris card the Vdds were
defined as garbage values when parsing the default pp_table. Though
specifying custom values in mV worked without issues.
If you are seeing values like 0xff01, those are not garbage. They are virtual voltage ids so that the driver uses look up the proper voltage via a different method.
It might be that, just not in hex. E.g. VddcLookupTable entry 1 returns a
Vdd of 65282.
Correct. 65282 is 0xff02 which is a virtual voltage id. The driver uses that id to look up the real voltage based on the leakage for the board. Take a look at smu7_get_evv_voltages() or smu7_get_elb_voltages() in smu7_hwmgr.c.
I've managed to undervolt my MSI 5700 XT Gaming X using upp (i.e. editing pp_table) and it is working as I had hoped (less power usage, a lot cooler and quieter but generally only a few fps slower). However, the voltages used in pp_table are 4 times what the actual values are supposed to be (at least what seems to be reported under Windows according to online screenshots). From pp_table dump:
I've set the MaxVoltageGfx to 4400 (-100mV or 1100 mV from 1200 mV supposedly). Is this expected or will future patches treat voltages as in Windows, meaning that I've suddenly set the max voltage 3.2 V higher than stock? Tested under amd-staging-drm-next (drivers via oibaf) as of today (2019-12-27) as well as kernel 5.5-rc2.
Thanks, that's really useful information. I can confirm this, and it works fine when setting manual values that way. I also didn't notice any other issues when setting custom power play table for Navi.
I got overdrive working on Navi. The issue is that cat /sys/class/drm/card0/device/pp_od_clk_voltage returns
0: 800MHz @ 0mV1: 1412MHz @ 0mV2: 2024MHz @ 0mV
To avoid crashing, it's required to provide voltages for all three points via echo "vc 0.... The voltage for idle probably should not differ from default.
Though there is one issue remaining: In pp_od_clk_voltage, there is only
OD_MCLK:0: 875MHz
for VRAM. But echo "m 0.. returns invalid, it has to be echo "m 1.. instead.
This should work fine if you have a 60Hz display, but with 1440p 75Hz, Navi VRAM clock is locked to highest to avoid flickering (which is really unfortunate). When applying the above VRAM OC, also one VRAM pstate below maximum gets unlocked, causing flickering with 75Hz. You additionally need to lock to the highest VRAM state then.
The same happens when using custom SPPT.
Btw: Did anybody manage to increase maximum allowed power consumption via SPPT? It works just fine via OD, but I'm not having success via SPPT.