DisplayPort connected monitor wakes from DPMS sleep with a blank screen

mentioned in issue #662

Possibly a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=215203

Does reverting e3b39825ed0813f787cb3ebdc5ecaa5131623647 fix the issue?

Does this patch help?

bug215203.diff

Patch definitely helps. My monitor now remains asleep with the GPU in power save.

Just for clarity, you are referring to my patch (bug215203.diff), not the revert?

Correct - referring to the patch. Do you still want me to try the revert?

Yes, that would be good to verify that it is the same issue as the other bug.

Alright, got my source tree back up to 5.15.10-arch1 and reverted e3b39825ed0813f787cb3ebdc5ecaa5131623647. (I do not have the patch applied to this build, I only reverted the commit)

Monitor stays off after going into sleep mode. Left it for a few minutes and it seems to stay off.

A couple things I noticed when testing:

Applying the patch caused the GPU to power down when the monitor went into sleep mode (RADEON light turned off on the GPU).
Reverting the commit caused the GPU to stay powered up when the monitor went into sleep mode (RADEON light stayed on)

Either way, both methods allowed the monitor to stay asleep as it should.

Thanks for verifying.

What I think is happening is that when the displays are turned off and the GPU is idle, the driver enters runtime suspend and powers down the GPU. At some point, some process queries the GPU which causes the driver to runtime resume in order to wake the GPU up. As part of that runtime resume, the driver sends a hotplug event to userspace in case something on the display side was changed while the GPU was off. The desktop then sees the hotplug event and re-probes the displays and lights them up again. e3b39825ed0813f787cb3ebdc5ecaa5131623647 fixed a bug which prevented runtime pm from being enabled at all efifb never released its runtime reference so the device never runtime suspended and runtime resumed which would have never resulted in the hotplug event.

Does this patch work as well? bug215203-2.diff

With bug215203-2.diff I have the issue, display/GPU sleeps - GPU turns back on after a few seconds and wakes monitor.

Thanks. Too bad bug215203-2.diff doesn't help. It would be nice to keep the hotplug check. This patch should do the trick.

0001-drm-amdgpu-disable-hotplug-events-on-runtime-pm-resu.patch

Still having the issue with that patch applied as well. Strange...

Had some time to go back and re-test all three patches individually. I found that all three patches still expose the issue in some form.

Wayland seems to expose it the best. When I do something like "xset dpms force suspend" in XOrg like in the other ticket, the display stays off. If I let the desktop (GNOME or KDE) lock the screen (either Wayland or XOrg), the GPU powers down and comes back on within 10-15 seconds and powers up the monitor.

Not sure why the first patch (bug215203.diff) worked the first few times I tried it.

How about these patches?

0002-drm-amdgpu-disable-runpm-if-we-are-the-primary-adapt.patch

0001-fbdev-fbmem-add-a-helper-to-determine-if-an-fb-is-co.patch

compiling now, will let you know shortly

Finished and rebooted with the new kernel/modules. Interesting behavior now:

First screen lock: monitor sleeps & GPU turns off. Turns back on after 10-15 seconds.
~~Subsequent screen locks: monitor sleeps & GPU stays on. Monitor turns back on almost immediately after sleeping.~~

EDIT: Never mind, spaced out my lock attempts by a few more seconds and the GPU does in fact power off/monitor sleeps. Same behavior as before.

It looks like those patches should disable runpm based on the primary FB device? Is there a way I can check that the conditions are validating true?

/sys/module/amdgpu/drivers/pci:amdgpu/0000:08:00.0/power/runtime_suspended_time is incrementing

I inserted a couple dev_info lines into amdgpu_kms.c to see if is_fw_fb returning anything:

dev_info(adev->dev, "We are running custom module\n");

# Not sure if I did the following right, but it returns?
dev_info(adev->dev, "FW FB: %d \n", is_fw_fb); 

if (is_fw_fb)
{
	dev_info(adev->dev, "Primary adapter detected, disabling runtime pm\n");
	adev->runpm = false;
}

if (adev->runpm)
        dev_info(adev->dev, "Using BACO for runtime pm\n");

Here's the dmesg output I get:

[   11.511682] amdgpu 0000:08:00.0: amdgpu: We are running custom module
[   11.511683] amdgpu 0000:08:00.0: amdgpu: FW FB: 0 
[   11.511684] amdgpu 0000:08:00.0: amdgpu: Using BACO for runtime pm

It looks like amdgpu_is_conflicting_framebuffer(base, size) isn't returning anything, since "Primary adapter detected..." doesn't show up in the log, and FW FB is 0 (but I'm not sure I did that right...). Hopefully that is helpful?

dmesg.log

(btw I've been hacking around on this on Fedora 35, but shouldn't make too much difference I imagine)

One additional thought, don't we want "is_fw_fb" to be false to disable runpm? If I'm reading the code right (I'm no programmer...but I know a little C/C++) is_fw_fb -> amdgpu_is_conflicting_framebuffer -> is_conflicting_framebuffer returning false would be correct, since there isn't another GPU in my system.

So this:

if (is_fw_fb)

should be this?

if (!is_fw_fb)

We want to disable runtime pm if the board is the same one used by efifb. It restores the behavior that there with e3b39825ed0813f787cb3ebdc5ecaa5131623647 applied (i.e., runtime pm never kicks in because efifb never dropped it's runtime pm reference).

We want to disable runtime pm if the board is the same one used by efifb.

Why is that, though? amdgpu supplants efifb. Why does the GPU driven by amdgpu need to stay on just because it used to be driven by efifb?

This smells like a convoluted workaround for the real issue (which could presumably still happen with non-primary GPUs), which is that monitors stay on when they should be off after runpm resume.

BTW, I wonder if #662 might be fundamentally the same issue after all. In both cases, monitors stay on when they should be off after a hotplug event. Maybe amdgpu accidentally causes the monitors to turn on as part of the hotplug event or as part of the resulting probe of display connections?

It's just a workaround for 5.16 to restore the previous behavior rather than reverting e3b39825ed0813f787cb3ebdc5ecaa5131623647 until we can sort out a better solution.

@trilantis i updated the 2nd patch from Alex a little. It should do the trick(disable the runpm) now. 0002-drm-amdgpu-disable-runpm-if-we-are-the-primary-adapt.patch

Thanks Evan. Here are some reworked patches with better naming in patch 1 and proper handling in patch 2 for multiple GPUs.

0002-drm-amdgpu-disable-runpm-if-we-are-the-primary-adapt.patch

0001-fbdev-fbmem-add-a-helper-to-determine-if-an-aperture.patch

Hey Alex, patch 0002 is failing for me:

patch -p1 < ~/bugs/amdgpu-1840/0002-drm-amdgpu-disable-runpm-if-we-are-the-primary-adapt.patch 
patching file drivers/gpu/drm/amd/amdgpu/amdgpu.h
Hunk #1 succeeded at 1069 (offset -8 lines).
patching file drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
Hunk #1 FAILED at 39.
Hunk #2 succeeded at 1246 (offset -644 lines).
Hunk #3 succeeded at 1274 with fuzz 2 (offset -644 lines).
Hunk #4 succeeded at 1332 (offset -656 lines).
Hunk #5 succeeded at 1348 (offset -656 lines).
1 out of 5 hunks FAILED -- saving rejects to file drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c.rej
patching file drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

In the patch, amdgpu_drv.c @line 39 is shown as:

 #include <linux/mmu_notifier.h>
 #include <linux/suspend.h>
 #include <linux/cc_platform.h>
+#include <linux/fb.h>
 
 #include "amdgpu.h"
 #include "amdgpu_irq.h"

Looks like cc_platform.h is not referenced on 5.15.10/5.15.11:

#include <linux/mmu_notifier.h>
#include <linux/suspend.h>

#include "amdgpu.h"
#include "amdgpu_irq.h"

EDIT: 5.16rc6 seems to be the build for these patches. I'm compiling and will let you know in about 30 minutes.

Sorry to say, but no luck with the newest patches on 5.16-rc6. RunPM is still getting enabled.

[   12.617895] amdgpu 0000:08:00.0: amdgpu: Using BACO for runtime pm

dmesg-5.16-rc6.log

Hmmm, care you sure it got applied correctly? Can you add some debugging output to see what is_firmware_framebuffer() returns in amdgpu_is_fw_framebuffer()?

Good news! I swapped over to Arch, compiled a fresh mainline 5.16-rc7 build with these two patches applied and it does indeed work. GPU stays on, monitors sleep properly.

On Fedora, I double checked everything and made sure the patches were applied - couldn't figure it out. Only thought I had was that they're in the middle of switching over to using simpledrm/fbdev emulation for F36 and it was being used in the 5.16-rc6 kernel I pulled from their git tree (you can see it in the dmesg log).

https://fedoraproject.org/wiki/Changes/ReplaceFbdevDrivers

Is efifb enabled in your kernel config?

That would be why the patches didn’t work on Fedora. On their 5.16rc6 branch, they have CONFIG_FB_EFI disabled in the config because of the changes they’re making for Fedora 36 and disabling the legacy fbdev driver (which will probably make this workaround not work when it’s released).

I will try again when I have a minute and recompile on Fedora using a config from Fedora 35 with CONFIG_FB_EFI=y. I imagine it should work fine.

Patches work on Fedora 35/5.16-rc7 with CONFIG_FB_EFI=y

added AMDgpu DC labels

Usually on bootup, the BAR 0 of AMD GPU will be assigned to efifb. Then later, amdgpudrmfb will take over the framebuffer from efifb. However, by checking the log posted by @trilantis , i did not see such process. Instead amdgpudrmfb seems the 1st and unique owner of the framebuffer. Kind of weird...

Hmm, maybe some update(place the is_fw_fb check after amdgpu_device_init() where amdgpudrmfb takes over the framebuffer) from Alex's original patch sets can work for this. @trilantis can you give the patches below a try on your Fedora and Arch system?

0001-fbdev-fbmem-add-a-helper-to-determine-if-an-aperture.patch (updated: drop the check for FBINFO_MISC_FIRMWARE)

0002-drm-amdgpu-disable-runpm-if-we-are-the-primary-adapt.patch

@equan I don't think the patches need to be changed - I think the reason it didn't work on Fedora was due to CONFIG_FB_EFI=n in the Fedora kernel on their 5.16rc configs. See my response to Alex above. Going to test again using a Fedora 35 .config and make sure CONFIG_FB_EFI=y.

These patches are an alternative approach which should handle things better if there is no efifb.

0002-drm-amdgpu-only-check-display-if-the-GPU-has-them-in.patch

0001-drm-amdgpu-don-t-runtime-suspend-if-there-are-displa.patch

Those patches are working for me on a kernel build without fbdev/efifb.

Thanks for all the work on this issue! I look forward to removing amdgpu.runpm=0 from my args sometime in the near future :)

hmm, looking at the runtime pm documentation more closely, I don't think these patches will help. If the driver enables autosuspend, the idle callback is not used. It sounds like maybe this is a race between driver init and runtime suspend. Does the attached patch help?

0001-drm-amdgpu-make-runpm-init-the-last-thing-we-do-in-p.patch

That patch seems to work on Fedora kernel 5.16-rc7 (CONFIG_FB_EFI=n). GPU stays on, monitor stays asleep.

mentioned in commit agd5f/linux@9a45ac23

mentioned in commit agd5f/linux@b95dc06a

mentioned in commit agd5f/linux@b92a0fcd

mentioned in commit agd5f/linux@0769bf07

mentioned in commit nouveau@fa3d8456

mentioned in commit nouveau@b4391e49

I have a similar issue, but I'm not sure if it is the exact same. Recently, my monitor (connected via DP) has started exhibiting this blanking behavior after shutdowns. I actually have to physically disconnect power from the monitor (and reconnect it) to get it in working order again. In fact, the on screen display normally activated by a button on the back of the monitor isn't even functional when it is put into this "zombie" power state at a shutdown. No reboots, hot or cold help: only cutting power to the monitor. Has anyone in this thread has this issue? This occurs only with Linux. On a shutdown of Windows or macOS no such issue happens.

mentioned in issue intel#7581

I had a similar problem to this. After locking the screens they would blank (as desired) but then turn back on seconds later.

DPMS signal is sent to monitor.

Monitor goes into "deep-sleep", which drops the DP link. (as I understand it, LG monitors are notorious for this)

amdgpu thinks nothing is connected to the card, so it turns it off (hence the relation to runpm).

The monitor detects that something changed with DP and tries to renegotiate.

Something bugs in amdgpu and the card turns back on, monitor turns back on, and you get a blank screen.

This seems to match what was happening for me. However there are some differences.

My monitor is an Acer, not LG and connected by HDMI, not DP.
I have multiple displays connected, but presumably the GPU tries to power down when all are off.

I managed to work around this issue by disabling automatic input selection on the Acer monitor. After doing that the displays stay off.

Sounds more like #662 offhand, though per #1840 (comment 1200631) it might actually be the same issue.

#2876 is exactly like this as well, but then with HDMI (like @kevincox).

mentioned in issue #2876

Experiencing the same issue after upgrading GPU from an nvidia to amd (polaris). I have multiple monitors, one is connected via HDMI to the AMD card and the other one to iGPU. Both monitors wake up a few seconds after triggering dpms suspend. Blacklisting amdgpu kernel module and just booting with just i915 makes the issue go away, but only igpu works in that case.

I have been having the same issue that @justinkb mentioned #1840 (comment 1219268). After shutdown, my main DP monitor stays on (just the backlight), and the only way to fix it is to unplug and re-plug the monitor.

Any idea on a fix for this?

System:

Ryzen 9 5900x
RX 7800XT
KDE Plasma 5 and 6 on Wayland
EndeavourOS on the latest zen kernel

I am currently trying out the amdgpu.runpm=0 kernel parameter, will notify if that solves the problem for me or not.

DisplayPort connected monitor wakes from DPMS sleep with a blank screen

Brief summary of the problem:

Hardware description:

System information:

Designs

Child items ...

Activity

Admin message

Admin message

DisplayPort connected monitor wakes from DPMS sleep with a blank screen

Brief summary of the problem:

Hardware description:

System information:

Activity