5.12: daisy-chained displayport monitors show on, but are off (BISECTION to 5.6 commit)
Brief summary of the problem:
On linux 5.12, when switching modes, including early or late KMS, my 5 monitors slowly switch between: being on and properly displayed; on but pure black; and off. They seem to always stabilize with having the 3 monitors directly connected to the video card on, and the 2 daisy chained off them to being off. Those 2 that are off are momentarily on during the switching mode problems, but all 5 aren't on at once. Links to videos are below.
*** sysfs, xrandr, and KDE Plasma Display Configuration show all 5 monitors are properly connected and in use, even when the video card has turned them off! ***
This has been happening since linux 5.6, although along the way other versions would randomly and rarely get all 5 on, and didn't seem to consistently have only the monitors plugged directly into the video card on. So, that 5.12 leaves only the daisy chained ones off could be a red herring. I haven't been able to replicate having a lucky boot with all 5 on in 5.12, but maybe I've just been unlucky.
I've been running linux 5.5.13 for about a year due to this, and haven't had any of these problems, including versions before that going back around an additional year. I've tried many versions after 5.5.13, including at least one in each major release.
Hardware description:
- CPU: Dual Intel Xeon E5-2590 (v1) @ 2.90 GHz
- GPU: Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1) - Sapphire 21275-03-20G (Radeon RX Vega 64 8GB)
- System Memory: 64GB (Samsung 8x8GB 1066MHz PC3-8500 ECC Registered)
- Display(s): (5) Acer K272HUL, (1) Emerson LCD TV. During all testing, the LCD TV was off - seen, but off.
- Type of Display Connection: Acer LCD's using Accell DisplayPort 1.2 HBR2 VESA-Cerified cable B142C-007B, rated for 4K @ 60Hz. 2 of the Acer LCD's have another one daisy chained off them, and 1 is by itself. Emerson LCD TV using Highwings HDMI 2.0, rated for 4K @ 60Hz.
System information:
- Distro name and Version: Arch Linux
- Kernel version: Bug happens on 5.6 - 5.12, so I run 5.5.13 to avoid it.
- Custom kernel: Various commits, mentioned elsewhere
- AMD package version: No package, only kernel amdgpu
How to reproduce the issue:
Here are videos of 5.12 with the bug, and 5.5.13 without it. In my text below for each video, I refer to the monitors with numbers from left to right. Monitors 1, 3, and 4 are connected to the card. Monitor 2 is daisy chained off 1, and 5 is daisy chained off 4.
A video of 5.12, without early KMS:
- From POST to late KMS, monitors 1 and 2 are mirrored, and 3-5 are off with amber LED's.
- At KMS, monitors 1, 3, and 4 wind up being mirrored, with 1 getting no signal momentarily. It takes 16 seconds, instead of the typical 5 seconds.
- I use startx. The monitors slowly switch between various stages of being tty and mirrored, no signal, on but dark, off with amber, and it's 60 seconds before it stabalizes with monitors 1, 3, and 4 being displayed independently, with 2 and 5 off.
- I hit printscreen, which shows the x server thinks it's displaying on all 5 monitors. It's not in the video, but KDE display configuration shows all monitors are enabled, and has virtual screen space allocated and usable for them.
- I hit shutdown, and the monitors go through more slow switching between various stages. Often some of them go back to being a mirrored tty, but in this video the tty is never displayed.
A video of 5.5.13, without early KMS:
- From POST to late KMS, monitors 1 and 2 are mirrored, and 3-5 are off with amber LED's.
- At KMS, within 5 seconds all monitors are mirrored.
- I always boot to tty, and use startx. Within 5 seconds all monitors are on and displayed independently.
- I reboot, and back in tty mode all 5 monitors are mirrorred.
- Other than KDE plasma never remembering which background goes on which monitor, everything works properly.
- Note at no point under 5.5.13 does a monitor show blue no signal box on the monitor.
If AMD tests many monitors using DisplayPort daisy chaining using a Vega 64, I think AMD would easily replicate this issue. That said, let me know how to get anything else you need, and I'll get it. Give me patches to see if they fix it, or even just to give you needed diagnostic data. I really need to upgrade kernels...
Bisection
I'm really hoping we can fix this with the drm debug output I'll be attaching, or patches to generate more debug output. I've spent several days bisecting. It's very difficult for 2 reasons: knowing how to determine good vs bad, since the behavior changes; and git bisect
does not seem to be properly handling the complicated merge history correctly. Maybe I need to run a lot of the bisection manually, as I can find some people have. [1] [2] (especially the answer by Rich, and the comments he refers to from CB Bailey.) Maybe git bisect
is supposed to handle whatever tree you throw at it, as some people say. [3] But, in that case, I believe I've found a situation it can't handle as I'll detail in a follow-up, if needed.
Due to these problems am not sure I'm finding anything useful, I'm stopping there with bisecting, and posting what I have.
Attached files
Various logs to follow momentarily.