Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
probably due to 6e7545ddb13416fd200e0b91c0acfd0404e2e27b (which in essence reverts the workaround from #1709 (closed) - by re-enabling MPC_SPLIT_AVOID_MULT_DISP)
In addition to the memory clock maxing out, I have noticed 5.16.10 also brings back the bug that prevents loading a graphical environment, which seems to stem from #1709 (closed). 5.16.9 works fine.
amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_0.0.0 (-110).[drm:process_one_work] *ERROR* ib ring test failed (-110).amdgpu 0000:03:00.0: amdgpu: Failed to power gate JPEG![drm:jpeg_v2_0_set_powergating_state [amdgpu]] *ERROR* Dpm disable jpeg failed, ret = -62.amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000002E SMN_C2PMSG_82:0x00000000amdgpu 0000:03:00.0: amdgpu: Failed to power gate VCN![drm:amdgpu_dpm_enable_uvd [amdgpu]] *ERROR* Dpm disable uvd failed, ret = -62.amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000002E SMN_C2PMSG_82:0x00000000amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000002E SMN_C2PMSG_82:0x00000000amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000002E SMN_C2PMSG_82:0x00000000amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000002E SMN_C2PMSG_82:0x00000000
Reverting 6e7545ddb13416fd200e0b91c0acfd0404e2e27b fixes the stuck memory clock for me (once again) on 5.15.29 (and newer) with Navi 10 (dcn20). I don't have any flickering or other issues driving three 60 Hz DP MST monitors with MPC_SPLIT_DYNAMIC, so unconditionally fixing the clocks to max (causing the GPU to draw 30 watts and constantly spin up its fans) with MPC_SPLIT_AVOID_MULT_DISP does not seem like a proper solution to me.
Unfortunately, with that patch reverted, a number of people have reported hangs on driver load with multiple displays attached. Unfortunately, we have not been able to repro the issue internally so it's been slow to debug.
Hi @agd5f, has there been any progress on debugging this issue? The issue still persists and hangs my system on driver load after testing kernel 5.18.1.
well, I haven't tried reverting the commit lately - but I startet to poweroff my 2nd monitor (no led active) and when I need it, to enable it. The crux is, it has to be in powered-off state when booting.
With this scheme I get ~ 15W idle w. 1 monitor vs. 36W idle w. 2 monitors. I have to manually (as in physically) hit the power button on my 2nd monitor.
I did the same yesterday, mostly to test which monitor causes the issue. I realized that my high frequency monitor can work up to 144hz at ~9W idle. If I put it over 144hz it starts using ~30W. Both monitors at 60hz will use 30W+. So it's definitely caused by two monitors working at the same time. I don't have to turn the monitor off from the physical button, I just turn it off from the Gnome settings. Reverting the commit didn't fix it, but I learned a lot about building my custom kernel, so I count that as positive
p.s I tested the issue both on Linux and Windows, and it's causing exactly the same power usage and high mclk.
Hmm, I actually just patched my Linux kernel (5.17.5) and it didn't fix it for me, so it must be something else for my GPU / Monitors (AMD 5700XT and two monitors running at 60hz).
I bet it has something to do with vertical blanking interval but I don't think there is a way to change it on Gnome Wayland... oh well.. I'm kinda tired of my GPU using so much power on idle, but I don't see any easy solution and this issue persist for at least 1 year now.
I'm successfully using the following patch on 5.15.35 (will try on a newer version later) with a 5700 XT and three 60 Hz monitors (DisplayPort MST) on KDE Plasma:
But based on your results it might be a different issue, as without this patch I've only ever seen the memory clock at max (875 MHz), while you do get values lower than that (but still higher than they should be). My three monitors are identical with presumably identical timings, so I guess that's why this fix works in my case.
Just as a follow-up question, do you still see a difference with/without this patch, i.e. without it always max MCLK, but with it sometimes lower than max MCLK? If the patch works you should still see lower MCLK states and power consumption over time even if the monitor timings wouldn't allow dropping MCLK all the way to state 0.
2560x1440 (0x3f5) 311.750MHz -HSync +VSync *current +preferred h: width 2560 start 2752 end 3024 total 3488 skew 0 clock 89.38KHz v: height 1440 start 1443 end 1448 total 1493 clock 59.86Hz
1920x1080 (0x41) 173.000MHz -HSync +VSync *current +preferred h: width 1920 start 2048 end 2248 total 2576 skew 0 clock 67.16KHz v: height 1080 start 1083 end 1088 total 1120 clock 59.96Hz
Maybe it has to do with the different resolutions or timings but it's a bit strange that it can drive one monitor 2560x1440@144hz with 8W but needs 30W for two 60hz monitors.
edit: without the patch my clocks are still stuck, so the patch works, but only for 1080p@60hz. I can't use my monitor at that resolution so I guess I will have to find another solution or keep using my GPU at level 3.