Cannot set pstate values with under-voltage values for Vega M
Submitted by Robert Strube
Assigned to Default DRI bug account
There are some folks on the Windows side of the fence that have had great success under-volting their Vega M chips to improve thermals (and because TDP is shared between the CPU and GPU on Kaby Lake G, also their overall performance).
Following information from the amdgpu kernel documentation I applied the following kernel boot parameters:
Note: I needed the runpm=0 so that the Vega M is active at all times, otherwise it powers down and I'm unable to make any changes to the pstates using the sysfs API.
Then doing the following I was able to see the current pstates:
0: 225MHz 750mV
1: 400MHz 750mV
2: 535MHz 750mV
3: 715MHz 750mV
4: 850MHz 750mV
5: 960MHz 750mV
6: 985MHz 750mV
7: 1011MHz 750mV
0: 300MHz 750mV
1: 500MHz 750mV
2: 700MHz 750mV
SCLK: 225MHz 1011MHz
MCLK: 300MHz 700MHz
VDDC: 750mV 750mV
Then this to support updating the pstates:
echo "manual" > /sys/bus/pci/drivers/amdgpu/0000:01:00.0/power_dpm_force_performance_level
Then I could push in new pstates values, provided the voltage was exactly 750mV. I could set a higher clock speed for a pstate, but I was unable to make any changes to the voltage.
echo "s 0 226 750" > /sys/bus/pci/drivers/amdgpu/0000:01:00.0/pp_od_clk_voltage
works and increases the MHz from 225 -> 226 for pstate 0, but
echo "s 0 226 749" > /sys/bus/pci/drivers/amdgpu/0000:01:00.0/pp_od_clk_voltage
-bash: echo: write error: Invalid argument
Checking dmesg I see that the voltage range is constrained between 750 and 750.
[ 4362.487021] amdgpu: [powerplay] OD voltage is out of range [750 - 750] mV
Would it be possible to add support for under-volting Vega M using the sysfs API?
I also noticed that it's not possible to set any pstate higher than 1011 MHz, perhaps that's a harder sell, but it might also be nice to be able to overclock the Vega M a little!
P.S. I tried finding these values in the kernel source code, but I was extremely confused about the way they are defined. I figured worse case scenario I could manually modify these limits and recompile the kernel.