nouveau display blank after 6.2.11 kernel upgrade

Could you check if forcing the Display to DisplayPort 1.2 makes any difference?

But it's a bit odd, that it works with 6.2.8, but not with 6.2.11. Are those kernel versions correct? The one in the title also looks odd.

I don't know how to force the Display to a specific port. Is it some kernel video=DP?? parameter I can set? I really haven't done any kernel debugging before, and had started taking them for granted.

My system is a software development platform, so I multiboot with NixOS, Arch, KDE Neon, Ubuntu and Debian. I spend most of my time on NixOS Unstable but check out the other systems to make sure my changes don't break on other systems.

The problem showed up first updating NixOS unstable system from kernels 6.2.10 -> 6.2.11. And then on updating Arch and then Fedora which I hadn't updated a while and was running 6.2.8. The Fedora continued to work correctly when I booted the older kernel. The problem seemed to be backported to updated older kernels and showed up in NixOS 22.11 stable and later Debian. Its hard (at least for me) to rollback the Arch Kernel and with NixOS, rolling back just kernels is doable but involved rolling back everything or building the kernel from scratch).

Since some time has passed and multiple updates have happened, I started moving to using the NVidia proprietary drivers which work OK, but since I'm not doing 3D graphics, really work worse than the simpler Nouveau setup.

It seems easiest to test on Fedora where I can keep old kernels and switch on and off the proprietary drivers the easiest. The logs from all the systems but Debian look about the same, referencing the same line of code. the Debian log is attached.Debian-failed.log

I don't know how to force the Display to a specific port. Is it some kernel video=DP?? parameter I can set?

On your display directly somewhere in the menu.

mhhh.. 3d9c62ec could have broken it...

It might be that with that patch nouveau tries to use a higher resolution/refresh rate or something and that this one was either not supported before or something is messed up now.

Can you check what highest resolutions were supported on 6.2.10 or earlier on both displays? And if those are lower than what the devices actually support?

The displays always showed up with their full resolutions.

What about the refresh rate?

The Debian 5.10.0-21, Fedora-38 6.2.8, and Ubuntu 6.2.0-20 kernels show 30 Hz for the 4K monitor. It runs at 60hz with the proprietary drivers (and newer kernels). The UHD 1440p display runs on both at 59.95 Hz.

okay, so I suspect this issue is caused by selecting 4K@60 instead of 4K@30, which the commit was addressing. But sadly using 4K@60 seems to not work without issues

Forcing DisplayPort 1.2 on the display might fix it as I think there is something wonky with our DisplayPort 1.4 implementation. Sadly I don't have a display to trigger this issue myself I think, but I'll check if I can somehow figure something out.

I was able to force my 4K monitor DisplayPort 1.4 -> 1.2, and it is working (at 30hz) with the 6.2.15-300 kernel which had been failing.

Thanks for the help. Is there anything else I can try that will help you?

with DisplayPort 1.2 you only get 30hz? That's a little surprising. But anyway.. I think I can try to see if I can force using the DisplayPort 1.4 stuff here and see if I can trigger it, just have to get a bit more creative. I'll ping here once I need something.

maybe one thing you could try.. Does it work if you only have the 4K display connected? And also do you get 60hz with DisplayPort 1.2? It's kinda weird that 60hz isn't available...

Mind booting with drm.debug=0x4?

I thought I had got things working with DisplayPort 1.2, but when I tried Fedora again, with the UHD display disconnected and the drm.debug set, I got the 4K screen totally blank and offline, and with no means of seeing even the boot log. I powered it off and have attached the 'journalctl b -1' log. The system then booted OK with the nouveau driver blacklisted (using the nvidia drivers).

Fedora-38-just-4k-1.2.log

changed title from nouveau display blank after 5.11 kernel upgrade to nouveau display blank after 6.2.11 kernel upgrade

assigned to @karolherbst

I upgraded my Debian System to Bookworm, with the 6.1.0-9 kernel, and as expected, it would not show either display. Since I had a system without the nvidia proprietary hacks, I experimented with a variety of setups and rearrangements of display port slots, and replacing the display port cables.

The system would boot fine with just the 2560x1440 59.95hz setup.

With just the 4K monitor, the screen went dark early in the boot sequence, and I could not see the console prompt for my encryption key. I assumed it switched to the non-existent i915 display and just typed it in. I could hear the disk drive start moving again but the display remained blank. I then when to poweroff the display, and evidently just triggered the hypernate mode. When pressed the power button again I saw the gdm greeting screen. I then connected the second monitor, and logged in. Surprisingly the system came up dual screen, and seems to be working OK.

This procedure seems to work the second and third time I tried it. I have included a journal log with drm.debug=0x4 boot-both-hybernate-restart-ok.log

interesting.. this would kinda imply there is a problem when nouveau takes over from the firmware framebuffer... do you know if just booting with the 2560x1440 and enabling/connecting the second one later works just as well?

btw, I also see a 3840x2160 120hz mode coming up, does that one actually work?

another thing.. does booting with nouveau.atomic=1 make any difference?

The first time I booted with just the UHD display connected. When I got to the GDM greeting page, I plugged in the 4K display port connector and 4K monitor backlight came on. After logging in, both displays were working. This was not repeatable and all the combinations of initial monitors and triggering hybernate did not work. The 4K display just went dark when plugged in, and the UHD display switched to show the desktop in a reduced size, with the remaining screen having scattered parts of the desktop shown. The mouse pointer was frozen.

However, I could then go back to the 5.8 kernel and things worked fine. Evidently getting it to work, in spite of happening 3 times in a row, is a low probability event.

As far as seeing the 4K monitor in 120 hz, the monitor is capable of that, although I don't know about the graphics card. The nvidia drivers seem to limit it to 60 hz. Even at 60 hz the monitor struggles to handle 4k video, with my hardware [A Dell XPS-8930 desktop].

I didn't see any effects of setting nouveau.atomic=1

huh... this is super weird... it kinda feels like the display gets re-probed a lot. @lyudess mind checking out the log in #211 (comment 1958408) and say if you spot anything weird there?

It's also interesting, that there is a 120 hz mode we don't reject, so I'm curious on why Userspace doesn't see it?

FWIW: DisplayPort uses HPD IRQs (e.g. a short pulse on the HPD line) in order to communicate with the host, especially so when MST is involved. And even though those short pulses are explicitly intended not to be used for hotplugs, well, they still get used for hotplugs anyway sometimes so it's not unusual for a driver to respond to one with a probe.

There's definitely a few things going on here though, I can see the core channel failing later which means we're setting something up with the display state incorrectly. I think I'll need a bit more info to come up with anything conclusive though. If you can, enable the following option in your kernel config:

NOUVEAU_DEBUG_PUSH=y

Then boot with:

drm.debug=0x116 log_buf_len=50M nouveau.debug=disp=trace

Added to the kernel command line. Then reproduce the problem, and then get me the entire kernel log from that boot (no snipping!).

(Also, something else weird I noticed here - look how many times we're calling nouveau_framebuffer_new() at the end! I'm fairly sure that's not normal lol)

Setting a kernel config parameter seems to mean rebuilding the kernel? I have not done this in Debian. If this is needed it will take me a bit to figure out how best to do this.

I ran your test with my normal two screen setup with NixOS and the 6.3.7 kernel. I also disabled the x915 driver so my console would not end up there. It settled to a blank 4K monitor and the 1440p display showing an initial login screen. The screen was unresponsive, but the keyboard was active and I was able to switch to another console screen. I got a dmesg and rebooted to get a full `journal -b -1' log which is attached.full-boot.log

Just an FYI - I'm not totally certain yet but I definitely already see some suspicious stuff in your dmesg that points out a couple of things nouveau is doing wrong here. It may take me a little bit to get to but I'll let you know when i've got a fix

Ok - I've got an attempt at a patch that should be worth trying:

lyudess/linux@fbbbd6fc

I'm not sure if this will fix your issue but it definitely should be a start

I extracted the patch from your copy of the kernel and applied it to the 6.4 kernel I was using on my nixos-unstable system that I used for the previous log.

Things work fine. I tried several restarts and a cold reboot. I noticed, using Gnome-44.2 (Wayland) display settings, that the 4K monitor came up at 30Hz, but it listed 60Hz and 119.91Hz as available. The NVIDIA proprietary driver limited it to 60Hz, but came up at 60Hz.

I reset the monitor to 60Hz and it accepted it without problems. Running a 4K 59.94Hz (yuv502p) video using mpv it would drop frames, but not as badly as the NVIDIA proprietary driver. It showed a large number (100s) of dropped frames with: Display FPS: 59.996 (specified) FPS: 59.940 (specified) 59.940 (estimated)

If I rebooted the system, the 4K display reset itself to 30Hz, but I could then set it back to 60Hz.

did you try the 119.91 Hz as well? If it gets advertised by nouveau it should in theory be within limits. It could be that is uses 6 bpc, so maybe the colors will get a bit "weird"? Would be good to try that out.

Also.. gnome remembers the last setting being used, so this shouldn't be anything nouveau can control.

I set the display to 119.91Hz, and it came up just fine. The colors seemed OK (at least to my eyes). mpv, glxgears, and es2gears seem to confirm the refresh rate. The 4K sony test video that I had run before dropped, much more frames, making it very annoying.

After setting the display to 119.91Hz and rebooting it came up in resetting to 60Hz. Setting it back to 119.91Hz and rebooting again then came back at 119.91Hz, so it may be that Gnome is a bit slow updating it preserved setting, or something about the switch through GDM to Gnome during a reboot.

But so far the system seems solid. I will keep using it.

interesting. I'm wondering if we indeed select a 8bpc mode or not, but we don't have any debugging on this side I think.... well.. if it works fine, then go for it, although I suspect performance will be "low" or at least not keeping up with 120 Hz :)

But cool to have figured out that bug we had with DP 1.3 workloads. Thanks a lot @lyudess !

No problem! Since it looks like it'll take me just a little bit more work to get the proper fix ready (but not much - just some search/replaces that unfortunately might need to be done semi-by hand) I will submit the workaround patch sometime today so we can get that merged for the mean time

added display widespread labels

mentioned in issue #208

mentioned in commit lyudess/linux@41db4113

Is there a way to test this patch live? I have multiple GT 710s in X570/B550 chipsets machines acting as stripped down displays for VFIO enabled hosts. I am on Arch with various kernel versions that exhibit this exact behavior, but none on displayport. All hosts use HDMI. Let me know if I should open a new ticket for this. Would love to sort this out.

This turned out to be an EDID issue on my end, as negotiation worked directly from the GPU to a few other monitors I had. Forcing EDID to 1920x1080x60 worked.