GPU clock occasionally stuck at low frequency when forcing high performance level
Brief summary of the problem:
Occasionally when playing a game there's a noticeable performance drop.
After a quick check via MangoHud in game it shows that the core clock is stuck at around 500MHz.
I then tried checking some of the sys api files.
/sys/class/drm/card1/device/power_dpm_force_performance_level
still reported that it's set to high
, which would be correct as it is being set by gamemode.
/sys/class/drm/card1/device/pp_dpm_sclk
though shows the clock being at 0: 500Mhz
instead of 1: 2720Mhz
.
I've checked thermals, and they hover around 60 degrees, so it shouldn't be throttling.
Running echo low > /sys/class/drm/card1/device/power_dpm_force_performance_level
followed by setting it back to high set the clock back to where it should be.
So far I've noticed the problem in Warframe and Deep Rock Galactic, with either DXVK or VKD3D.
Additional notes
I've recently had to debug a "green screen of death" problem, where repeated high load would completely crash the GPU, forcing the system to restart.
In order to work around this issue I had to force the system into a performance mode other than 'auto'.
I also added the kernel command line parameters amdgpu.ppfeaturemask=0xfff7ffff amdgpu.runpm=0
.
I don't know if any of those could cause problems related to this, but turning them off would make my system unstable, so I haven't tested that so far.
Hardware description:
- CPU: AMD Ryzen 5 7600X
- GPU: Radeon RX 6950 XT
- System Memory: 32GB
- Display(s): 2
- Type of Display Connection: DP
System information:
- Distro name and Version: Arch Linux
- Kernel version: 6.5.3-zen
- Mesa version: 23.1.6
How to reproduce the issue:
This issue happens randomly, but I get it at least once a day.
- Start up a game
- Play the game until noticing a significant performance drop