Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
5700XT: 20 second delay when display manager is loading ("I'm not done with your previous command")
When booting, it takes a relatively long time for the display manager to show up. This happens at each boot, although between restarts the behavior might get slightly better or worse.
I've been having issues with this for almost two years now.
Hardware description:
CPU: AMD Ryzen 9 3900X 3.8 GHz
GPU: Asus Radeon RX 5700 XT 8GB (I believe it's an early model, bought in October 2019).
System Memory: 4x 16 GB Corsair Vengeance LPX 3200 MHz
Display(s): Samsung 49" Curved Business Monitor with 32:9 Super Ultra-Wide screen C49J890DK
Kernel version: Linux workstation 6.2.6 #1-NixOS SMP PREEMPT_DYNAMIC Mon Mar 13 09:26:43 UTC 2023 x86_64 GNU/Linux
Custom kernel: N/A
AMD official driver version: N/A
How to reproduce the issue:
The ~20s delay always happens at start-up, when the display manager is supposed to show up. Persistent across reboots, shutdown, etc.
dmesg reveals:
[ 42.967803] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000000D SMN_C2PMSG_82:0x00000000[ 42.967811] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Other remarks
Actions taken in the past (not much detail, but I could try to reproduce any of them if needed):
Tested on DisplayPort back in September 2022 and the graphical target would show up faster, although it is undesireable since it seems to limit frequency to 100 Hz instead of 120 Hz.
Tried using a monitor with a smaller resolution and frequency and the issue is not reproducible.
Tried using the monitor's picture-in-picture function which reduces the resolution to 1920x1080 => would boot even when it used to crash.
In previous kernel versions, the only way to get it running was to use amdgpu.dpm=0. Since kernel 5.15.67, this got broken.
Even if amdgpu.dpm=0 would crash since 5.15.67, amdgpu.dpm=0 nomodeset would not crash, but the screen would be too stretched to be usable.
The issue is not limited to NixOS, happened on Arch Linux and Elementary OS as well.
Tried moving the graphics card to a different PCI Express slot (currently in a x8 slot even though the graphics card is x16) => no change.
At some point I had a [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! during runtime which would break KDE. This disappeared after some kernel updates, also set iommu=pt which seemed to improve the situation (not sure if just a coincidence).
A timeline of the issues (tested September 2022):
5.15.67
DPM off: kernel would panic amdgpu: smu firmware loading failed
DPM on: worked initially, after a cold reboot did not work anymore
5.18.19
DPM off: kernel would panic amdgpu: smu firmware loading failed
DPM on: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000001
5.19.8
DPM off: kernel would panic amdgpu: smu firmware loading failed
DPM on: seemed OK, but got a weird issue with snd_hda_intel
5.19.9
DPM off: kernel would panic amdgpu: smu firmware loading failed
DPM on: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000001
I has this same issue with Manjaro Kernel 6.4.3 and Dracut v59.
AMD Radeon RX 5700 with my dual monitors 1440@60Hz
amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000000D SMN_C2PMSG_82:0x00000000amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000000D SMN_C2PMSG_82:0x00000000amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000000D SMN_C2PMSG_82:0x00000000
That happens random (maybe 20% probability) after reboot , rest 80% probability without this issue.
I've been facing the same issue with my hardware setup, the symptoms match exactly. After booting it gets stuck for a few seconds when it switches to the login manager (greetd+tuigreet) and then for at least minute after authentication (while KDE initializes).
This started to happen after I upgraded my kernel version from 6.3.11 to 6.4.2. The problem is reproducible for me 100% of the time.
Hardware Description
CPU: AMD Ryzen 7 3700X
GPU: Sapphire Pulse Radeon RX 5700 XT 8GB
RAM: 2x16GB DDR4-3200
Display: 2x DELL S2721DS (2560x1440), connected via DisplayPort connectors
Operating System: Gentoo Linux
dmesg log tail:
[ 6.103815] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready[ 6.376873] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 10.561879] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 10.561881] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 14.728742] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 18.908604] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 18.908628] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 23.086753] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 101.640886] systemd-journald[222]: /var/log/journal/48ca7f441566dfeed1ebf4fb5f55a68a/user-1000.journal: Monotonic clock jumped backwards relative to last journal entry, rotating.[ 103.617288] sched: RT throttling activated[ 105.545236] snd_hda_intel 0000:0b:00.1: Refused to change power state from D0 to D3hot[ 106.347143] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 106.347147] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 110.532333] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 110.532336] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 114.848212] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 114.848216] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 119.025373] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 119.025376] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 123.844175] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 123.844179] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 128.025493] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 128.025496] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 132.206940] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 132.206945] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 136.842678] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 136.842681] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 141.319573] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 145.500441] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 145.500445] amdgpu 0000:0b:00.0: amdgpu: SMU11 attempt to set divider for DCEFCLK Failed![ 149.845257] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 149.845261] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures![ 154.027569] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002[ 154.027572] amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
I've ran git bisect between tags v6.3 and v6.4 for path drivers/gpu/drm/amd and got the following commit:
ea2b852b656afaf6d45597abbcac8425fa6ab02d is the first bad commitcommit ea2b852b656afaf6d45597abbcac8425fa6ab02dAuthor: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>Date: Thu Mar 9 10:30:32 2023 -0700 drm/amd/display: Set MPC_SPLIT_DYNAMIC for DCN10 Since DC version 3.2.226, DC started to use a new internal commit sequence that better deals with multiple hardware constraints. One of the improvements is a more reliable sequence for pipe split. Due to the transition made in version 3.2.226, it should be more reliable to use the pipe policy as MPC_SPLIT_DYNAMIC, and this commit makes this change. Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/amd/display/dc/dcn10/dcn10_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Same here with 5700XT, getting
amdgpu 0000:29:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002
after upgrade to Linux 6.4.
Downgrading to 6.1 solves the issue, will try 6.3 to check if this persists.
I and @rcrisostomo reproduce the same issue when using DisplayPort, dual monitors 1440p and RX5700* after boot only, but shutdown has no issue. 6.1.x and 6.3.x have no issue except 6.4.x.
@Zesko , good point. I believe it's not reproducible on DP but it would not be a fix for me unfortunately since it would reduce the screen refresh rate from 120 Hz to 100 Hz, probably due to bandwidth limitations.
Could you try to reproduce this issue with the latest amd-staging-drm-next?
Also, could you provide more details on how to reproduce this issue? I tried the following scenarios with amd-staging-drm-next:
Single 4K@120 HDMI
Reboot the system with Gnome 3 (Wayland) times: No issues, no dmesg message;
Reboot the system with Plasma (Wayland) times: No issues, no dmesg message;
4k@120 HMDI + 1440p@120 DP
Reboot the system with Gnome 3 (Wayland) times: No issues, no dmesg message;
Reboot the system with Plasma (Wayland) times: No issues, no dmesg message;
I'm unsure if I'm missing something or some steps to reproduce this issue in my setup. Please, let me know the result when using amd-staging-drm-next. Could you also attach the output of the below command:
cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log
Next, could you try:
Run this command as root echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm
Temporarily disable your UI: systemctl isolate multi-user.target
Enable your UI again: systemctl isolate graphical.target
At this point, you should see a bar at the bottom of your screen. Could you post a picture or describe the pattern you see?
Finally, could you attach the output of this command:
In my case it's 2x 1440p@170Hz connected over DP using Sapphire Pulse 5700XT. The errors occur naturally on every boot (using gdm at statrup), while isolating graphical.target (e.g. when following steps provided had to wait a couple of minutes after systemctl isolate graphical.target) or switching tty. Changing WMs doesn't change the behavior (tried Gnome both on XOrg and Wayland, Sway and Hyprland).
From the DTN log and the visual confirmation, the pipe split is happening correctly. I noticed some differences in the firmware version (SDMA, SMC, and SOS). Which distro are you using?
Also, did you have a chance to try amd-staging-drm-next?
I have the same firmware.txt and red bar at the bottom of two monitors like @kacper.bajenski 's photo.
A red bar appears at the bottom of two monitors after running systemctl isolate graphical.target and pressing REISUB-keys: SysRq, E to send the signal SIGTERM to terminate all processes.
@siqueira Those specific logs are from Arch and linux-zen kernel. I tried amd-staging-drm-next later with exactly the same results. Might retest later and let you know if something is different.
@siqueira I followed your instructions and I also get a red bar at the bottom of the each screen. The red bar is thicker on the right half of the screen, and turns blue after KDE starts.
There is also a few messages from the sound driver(s) which I noticed appear only when this boot issue is triggered:
[ 11.333230] snd_hda_intel 0000:0b:00.1: CORB reset timeout#2, CORBRP = 65535[ 11.556217] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x2f0d00. -5...[ 256.194136] snd_hda_intel 0000:0b:00.1: Refused to change power state from D0 to D3hot
My monitors have built-in speakers, usable via DP or HDMI, so this appears to be related?
@siqueira Here you go. There is not much happening, basically black screen for a minute (0:30 until the end) after booting. Additionally
dmesg.txt from this boot.
There are two different initramfs tools on Arch based distros.
mkinitcpio vs. dracut
When I use mkinitcpio to build initramfs, I can 100% reproduce the issue on every boot.
If I use dracut to create initramfs, there is a 20% - 30% chance of this issue ever occuring, the rest is no problem like luck. As I wrote above: #2475 (comment 1996549)
I enabled logging for booting, the boot screen showed the same repeatable error messages:
Aug 13 09:32:13 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002Aug 13 09:32:18 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002Aug 13 09:32:18 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!Aug 13 09:32:23 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002Aug 13 09:32:23 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!Aug 13 09:32:28 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002Aug 13 09:32:32 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002Aug 13 09:32:32 zesko kernel: PM: Image not found (code -22)Aug 13 09:32:37 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002Aug 13 09:32:42 zesko kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000040 SMN_C2PMSG_82:0x00000002
Apparently I was wrong and the issue is reproducible on my side on Linux kernel 6.4.10 with the pipe split policy set to MPC_SPLIT_AVOID_MULT_DISP.
I've double-checked that the patch has been applied correctly by slightly changing the I'm not done with your... message. Checked the number of warnings via:
journalctl -b | grep"amdgpu: SMU: I'm not done with your" | wc-l
Here are my findings:
Single 3840x1080 @ 120Hz via HDMI, with MPC_SPLIT_AVOID_MULT_DISP:
With just the patch applied: between 4-18 I'm not done with your previous command. Delay is directly proportional with the number of warnings in the logs.
Disabling the splash screen (Plymouth) does not reduce the number of warnings, it just delays the warnings until the desktop manager shows up.
Booting with amdgpu.dpm=0 does not work. It gets stuck at loading module amdgpu...
Booting with nomodeset did not work either, the X11 server crashed without displaying anything.
Changing refresh rate from 120 Hz to 99.99 Hz did not see any change either.
Cases where the boot generates zero such warnings and the boot is noticeably fast:
3840x1080 @ 120 Hz / HDMI, with the display off
1920x1080 @ 143.98 Hz & 120 Hz on the same monitor.
Additionally, setting the split pipe policy to MPC_SPLIT_AVOID seems to fix the issue both after restart and shutdown.
Top 5 results of systemd-analyze blame for 5 "I'm not done" warnings:
This bug happens at random, I have no errors for 73k seconds, the bisect is useless with this level of unpredictability, FYI I'm running v6.5.1 and I reported it back on 6.1, but no one was reading my report properly before closing the issue: #2462 (closed)
@agd5f I applied the revert commit on my old 6.4 kernel and still saw SMU errors in dmesg, the "different issue" you commented on went away, and I still experienced the weird hotplug issues, I thought the revert made it in to 6.5 so I pulled it, IDK what the merge window is on that.
I have an old RX 580 here I can try testing on as I bought the RX 5700 XT second hand this year, I cannot remember if I had any issues with the RX 580, but I definitely had issues with hotplug on my LG since buying it.
I'm happy to try it again but I'm not hopeful, would you advise anything to help debug the SMU or hotplug issues? I've done a little kernel debugging before (bcachefs) and can hook up gdb or perf record to see what happens.
I'm sick of having using hacky work arounds for 6 six months for this stupid hotplug behavior. most DE's don't expect your primary monitor to just randomly unplug during DPMS wake up.
Thanks @eliandoran , that makes a lot of sense, my Monitor randomly drops out after DPMS on so MULT_DISP doesn't work since it thinks I only have one monitor.
It would be good if my monitor didn't randomly "unplug" during DPMS / poweroff, not sure if that's a AMD issue or an LG issue or a VESA issue.
I do wonder why a 10 year old DP monitor doesn't have this issue but more modern panels seem to have this issue (I've seen reports online from LG and AUOptronics panels having issues like this). early DP specs seem to be extremely vague on signalling poweroff v unplugged, so I think VESA is at-least partially at fault here.
Is it time to buy Nvidia or dare I say Intel? What's the point in having open source drivers if only makes the driver quality and support objectively worse. I'm sick of having to write workarounds for this buggy driver. I'm sick of screaming in to the void of this bug tracker every few months, I'm sick of being ignored on IRC channel, Why won't anyone help me debug my issues. I'm sick of having to power cycle my laptop, or ssh kill sway to get my computer to DRAW SOMETHING ON MY MONITOR.
@agd5f Does the above patch constitute a proper fix? Am I wasting my time trying get attention to this bug. do you have any plans on fixing buggy display drivers on 4 year old cards? or do I have to be rich enough to purchase a GPU during a chip shortage to get basic functionality working, and just suck it up if Vega laptop?
I'm so sick of that lack of support around a product I paid for.