External display on Lenovo USB-C dock no longer initialized
Problem description
Linux kernel 5.7 and up no longer initializes the external display on a Lenovo USB-C dock. This used to work fine, since April 2020 in my current configuration and before with a different monitor.
I have connected my laptop to the docking station via USB-C. The laptop's lid is closed and turn the system on with the button on the dock. The machine starts and the external display comes to life (initialized by UEFI). After GRUB passes the display loses it's signal and no longer displays anything. This happens before Fedora can prompt for the LUKS encryption password. So we're pre-desktop and the only major software components present are the kernel and what's in the initramfs. So: when the kernel takes over the GPU control my display loses signal.
This is a Lenovo Thinkpad T470 connected to a Lenovo USB-C Docking Station. A Dell 32" 1440p monitor is connected to the docking station via a DisplayPort-to-HDMI cable. Trouble started with Linux kernel 5.7(.7) and since I am unable to boot properly using the dock. Typing the decryption password blind does continue to boot the system, but the external display is never initialized.
I can work by disconnecting the laptop from the dock to enter the password on the laptop itself and continue to boot to the display manager (GDM). Reconnecting the laptop to the dock at that time sometimes works fine, and yet sometimes results in a hung system.
Also reported at the Red Hat bug tracker: https://bugzilla.redhat.com/show_bug.cgi?id=1879442
Just today another user reported experiencing the same issue with a different docking station.
Hardware
Laptop (Lenovo Thinkpad T470) and related components:
- Intel i7-7500U CPU
- VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
- USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
- USB controller: Intel Corporation JHL6240 Thunderbolt 3 USB 3.1 Controller (Low Power) [Alpine Ridge LP 2016] (rev 01)
- Lenovo USB-C Docking station (model: DK1633, type: 40A9), connected by USB-C cable with the Thinkpad T470
- Dell S3220DGF display (on DisplayPort of Lenovo USB-C dock)
Firmware of the Thinkpad T470 is up to date:
$ sudo fwupdmgr get-updates
• Thunderbolt Controller has no available firmware updates
• INTEL SSDPEKKF256G7L has no available firmware updates
• System Firmware has the latest available firmware version
• UEFI Device Firmware has the latest available firmware version
• VMM3320 has no available firmware updates
Kernel(s)
Kernels tested of my distribution (Fedora 31):
- kernel-5.6.19-200.fc31.x86_64 (works)
- kernel-5.7.7-100.fc31.x86_64 (fails)
- kernel-5.7.8-100.fc31.x86_64 (fails)
- kernel-5.8.6-101.fc31.x86_64 (fails)
Self-compiled kernels from vanilla sources (git close from git.kernel.org) tested:
- 5.6 (works)
- 5.7 (fails)
- 5.8 (fails)
Attempted fix
To try and resolve my issue I performed a git bisect
between tags v5.6
and v5.7
. And later another bisect of smaller range between those two tags. The commit that causes my issue is 0f8839f5f3:
commit 0f8839f5f323da04a800e6ced1136e4b1e1689a9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Thu Feb 13 16:04:12 2020 +0200
drm/i915: Force state->modeset=true when distrust_bios_wm==true
Building a 5.7 or later kernel with that commit reverted restore working order and I can boot as I used to.
Reverting this commit works cleanly against v5.8
.
Against v5.9
is does not, but is easily resolved by hand. I have a patch that does this for master/v5.9 on my system and sent that off to the intel-gfx mailinglist as well: https://lists.freedesktop.org/archives/intel-gfx/2020-September/249394.html
I have received a response from @vsyrjala about this issue quickly. Many thanks for responding that fast! I am sorry I did not reply sooner, but I am here now. Work and life required more of my attention the past two weeks. I was asked to:
Can you file an upstream bug at https://gitlab.freedesktop.org/drm/intel/issues/new and attach dmesgs from booting both good and bad kernels with drm.debug=0x1e passed to the kernel cmdline? Bump log_buf_len= if necessary to capture the full log.
I have two kernels built from vanilla sources (git.kernel.org). Both based off commit 02de58b24d, one as-is the other with commit 0f8839f5f3 reverted (added my revert commit, 3b1af80e8289, only exists on my system). Normally I can boot with the kernel that has the revert commit and not with the stock one.
Adding the drm.debug=0x1e
cmdline and an extra log_buf_len=4M
allowed me to capture the extra output in full. After adding those options to the stock kernel I was in for a surprise... I was able to boot normally with the vanilla kernel!
This deepens the mystery...
I get the feeling that exact timing may play a role in this issue?
I realised at the time I built those kernels without debug symbols. I don't know if that matters, but two weeks ago I rebuilt both without disabling the debug symbols. I will boot my two kernels now, log their dmesg and upload the logs here. Bear with me.