Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Equinix is shutting down its operations with us on April 30, 2025. They have graciously supported us for almost 5 years, but all good things come to an end.
Given the time frame, it's going to be hard to make a smooth transition of the cluster to somewhere else (TBD). Please expect in the next months some hiccups in the service and probably at least a full week of downtime to transfer gitlab to a different place.
After April 15 Arch kernel 6.2.11 the display boot to a blank screen with the the
underscore (_) in the upper left. The also occured on NixOS, and Fedora 37 (on April 21)
when I tried updating that. It did not happen (Yet) on Ubuntu 23.04.
On NixOS I switched to the NVidia proprietary drive which work. On Fedora 38,
(Where I am presently) I went back to the 6.2.8-200.fc37.x86_64 kernel which works
and was the last Kernel I had used before the failure. From that system I get:
I don't know how to force the Display to a specific port. Is it some kernel video=DP?? parameter I can set? I really haven't done any kernel debugging before, and had started taking them for granted.
My system is a software development platform, so I multiboot with
NixOS, Arch, KDE Neon, Ubuntu and Debian. I spend most of my time
on NixOS Unstable but check out the other systems to make sure my
changes don't break on other systems.
The problem showed up first updating NixOS unstable system from
kernels 6.2.10 -> 6.2.11. And then on updating Arch and then Fedora
which I hadn't updated a while and was running 6.2.8. The Fedora
continued to work correctly when I booted the older kernel. The problem seemed to be backported to updated older kernels and showed up in NixOS 22.11 stable and later Debian. Its hard (at least for me) to rollback the Arch Kernel and with NixOS, rolling back just kernels is doable but involved rolling back everything or building the kernel from scratch).
Since some time has passed and multiple updates have happened, I started moving to using the NVidia proprietary drivers which work OK, but since I'm not doing 3D graphics, really work worse than the simpler Nouveau setup.
It seems easiest to test on Fedora where I can keep old kernels and switch on and off the proprietary drivers the easiest. The logs from all the systems but Debian look about the same, referencing the same line of code.
the Debian log is attached.Debian-failed.log
It might be that with that patch nouveau tries to use a higher resolution/refresh rate or something and that this one was either not supported before or something is messed up now.
Can you check what highest resolutions were supported on 6.2.10 or earlier on both displays? And if those are lower than what the devices actually support?
The Debian 5.10.0-21, Fedora-38 6.2.8, and Ubuntu 6.2.0-20 kernels
show 30 Hz for the 4K monitor. It runs at 60hz with the
proprietary drivers (and newer kernels). The UHD 1440p display
runs on both at 59.95 Hz.
okay, so I suspect this issue is caused by selecting 4K@60 instead of 4K@30, which the commit was addressing. But sadly using 4K@60 seems to not work without issues
Forcing DisplayPort 1.2 on the display might fix it as I think there is something wonky with our DisplayPort 1.4 implementation. Sadly I don't have a display to trigger this issue myself I think, but I'll check if I can somehow figure something out.
with DisplayPort 1.2 you only get 30hz? That's a little surprising. But anyway.. I think I can try to see if I can force using the DisplayPort 1.4 stuff here and see if I can trigger it, just have to get a bit more creative. I'll ping here once I need something.
maybe one thing you could try.. Does it work if you only have the 4K display connected? And also do you get 60hz with DisplayPort 1.2? It's kinda weird that 60hz isn't available...
I thought I had got things working with DisplayPort 1.2, but when I tried Fedora again, with the
UHD display disconnected and the drm.debug set, I got the 4K screen totally blank and offline, and
with no means of seeing even the boot log. I powered it off and have attached the 'journalctl b -1'
log. The system then booted OK with the nouveau driver blacklisted (using the nvidia drivers).
I upgraded my Debian System to Bookworm, with the 6.1.0-9 kernel, and as expected, it would not show either display. Since I had a system without the nvidia proprietary hacks, I experimented with a variety of setups and rearrangements of display port slots, and replacing the display port cables.
The system would boot fine with just the 2560x1440 59.95hz setup.
With just the 4K monitor, the screen went dark early in the boot sequence, and I could not see the console prompt for my encryption key. I assumed it switched to the non-existent i915 display and just typed it in. I could hear the disk drive start moving again but the display remained blank. I then when to poweroff the display,
and evidently just triggered the hypernate mode. When pressed the power button again I saw the gdm greeting screen. I then connected the second monitor, and logged in. Surprisingly the system came up dual screen, and seems to be working OK.
This procedure seems to work the second and third time I tried it. I have
included a journal log with drm.debug=0x4
boot-both-hybernate-restart-ok.log
interesting.. this would kinda imply there is a problem when nouveau takes over from the firmware framebuffer... do you know if just booting with the 2560x1440 and enabling/connecting the second one later works just as well?
The first time I booted with just the UHD display connected. When I got to the GDM greeting page, I plugged in the 4K display port connector and 4K monitor backlight came on. After logging in, both displays were working.
This was not repeatable and all the combinations of initial monitors and triggering hybernate did not work. The 4K display just went dark when plugged in, and the UHD display switched to show the desktop in a reduced size, with the remaining screen having scattered parts of the desktop shown. The mouse pointer was frozen.
However, I could then go back to the 5.8 kernel and things worked fine. Evidently getting it to work, in spite of happening 3 times in a row, is a low probability event.
As far as seeing the 4K monitor in 120 hz, the monitor is capable of that,
although I don't know about the graphics card. The nvidia drivers seem to limit it to 60 hz. Even at 60 hz the monitor struggles to handle 4k video, with my hardware [A Dell XPS-8930 desktop].
I didn't see any effects of setting nouveau.atomic=1
huh... this is super weird... it kinda feels like the display gets re-probed a lot. @lyudess mind checking out the log in #211 (comment 1958408) and say if you spot anything weird there?
It's also interesting, that there is a 120 hz mode we don't reject, so I'm curious on why Userspace doesn't see it?
FWIW: DisplayPort uses HPD IRQs (e.g. a short pulse on the HPD line) in order to communicate with the host, especially so when MST is involved. And even though those short pulses are explicitly intended not to be used for hotplugs, well, they still get used for hotplugs anyway sometimes so it's not unusual for a driver to respond to one with a probe.
There's definitely a few things going on here though, I can see the core channel failing later which means we're setting something up with the display state incorrectly. I think I'll need a bit more info to come up with anything conclusive though. If you can, enable the following option in your kernel config:
Added to the kernel command line. Then reproduce the problem, and then get me the entire kernel log from that boot (no snipping!).
(Also, something else weird I noticed here - look how many times we're calling nouveau_framebuffer_new() at the end! I'm fairly sure that's not normal lol)
Setting a kernel config parameter seems to mean rebuilding the
kernel? I have not done this in Debian. If this is needed it
will take me a bit to figure out how best to do this.
I ran your test with my normal two screen setup with NixOS and the 6.3.7 kernel. I also disabled the x915 driver so my console would not end up there. It settled to a blank 4K monitor and the 1440p display showing
an initial login screen. The screen was unresponsive, but the keyboard was active and I was able to switch to another console screen. I got a dmesg and rebooted to get a full `journal -b -1' log which is attached.full-boot.log
Just an FYI - I'm not totally certain yet but I definitely already see some suspicious stuff in your dmesg that points out a couple of things nouveau is doing wrong here. It may take me a little bit to get to but I'll let you know when i've got a fix
I extracted the patch from your copy of the kernel and applied it to the 6.4 kernel I was using on my nixos-unstable system that I used for the previous log.
Things work fine. I tried several restarts and a cold reboot.
I noticed, using Gnome-44.2 (Wayland) display settings, that the 4K monitor came up at 30Hz, but it listed 60Hz and 119.91Hz as available. The NVIDIA proprietary driver limited it to 60Hz, but came up at 60Hz.
I reset the monitor to 60Hz and it accepted it without problems.
Running a 4K 59.94Hz (yuv502p) video using mpv it would drop frames, but not as badly as the NVIDIA proprietary driver.
It showed a large number (100s) of dropped frames with:
Display FPS: 59.996 (specified)
FPS: 59.940 (specified) 59.940 (estimated)
If I rebooted the system, the 4K display reset itself to 30Hz,
but I could then set it back to 60Hz.
did you try the 119.91 Hz as well? If it gets advertised by nouveau it should in theory be within limits. It could be that is uses 6 bpc, so maybe the colors will get a bit "weird"? Would be good to try that out.
I set the display to 119.91Hz, and it came up just fine. The colors seemed OK (at least to my eyes). mpv, glxgears, and es2gears seem to confirm the refresh rate. The 4K sony test video that I had run before dropped, much more frames, making it very annoying.
After setting the display to 119.91Hz and rebooting it came up in
resetting to 60Hz. Setting it back to 119.91Hz and rebooting again then came back at 119.91Hz, so it may be that Gnome is a bit slow updating it preserved setting, or something about the switch through GDM to Gnome during a reboot.
But so far the system seems solid. I will keep using it.
interesting. I'm wondering if we indeed select a 8bpc mode or not, but we don't have any debugging on this side I think.... well.. if it works fine, then go for it, although I suspect performance will be "low" or at least not keeping up with 120 Hz :)
But cool to have figured out that bug we had with DP 1.3 workloads. Thanks a lot @lyudess !
No problem! Since it looks like it'll take me just a little bit more work to get the proper fix ready (but not much - just some search/replaces that unfortunately might need to be done semi-by hand) I will submit the workaround patch sometime today so we can get that merged for the mean time
Is there a way to test this patch live? I have multiple GT 710s in X570/B550 chipsets machines acting as stripped down displays for VFIO enabled hosts. I am on Arch with various kernel versions that exhibit this exact behavior, but none on displayport. All hosts use HDMI. Let me know if I should open a new ticket for this. Would love to sort this out.
This turned out to be an EDID issue on my end, as negotiation worked directly from the GPU to a few other monitors I had. Forcing EDID to 1920x1080x60 worked.