Renoir: plug in displays (via dock), receive drm-killing oops.
Brief summary of the problem:
Attaching a ThinkPad USB-C "dock" (type 40AS) with any DP/HDMI displays to a Renoir (R7 4750U) ThinkPad T14s G1 (machine type 20UH) that was booted without the dock attached produces an Oops that hangs or crashes anything attempting to do GPU work (e.g., Xorg, Chromium GPU, Firefox GPU), while freezing the image on the GPU head.
Detaching the "dock" while in this state appears to trigger a panic, but that's somewhat speculative, as I have no way to tell at that point.
This produces so many different oopses and assertion failures
(and, with those enabled, KASAN and KUBSAN reports)
that I'm struggling to pin it down more specifically,
or even flense down appropriate dmesgs.
So far, I've got issues showing up in
dc_link_allocate_mst_payload
,
dc_bandwidth_in_kbps_from_timing
,
dm_dp_mst_get_modes
,
drm_dp_mst_atomic_check_mstb_bw_limit
,
mpc2_assert_mpcc_idle_before_connect
.
While attempting to reproduce this bug, also reproduced #1418 (closed), #1358 (closed), #1344 (closed), #1337 (closed).
Possibly related:
-
dm_dp_mst_get_modes
→ #348 (closed) (fdo-bz#106159); that affects a Polaris11 running 4.16, but looks very similar by symptoms. -
dc_link_allocate_mst_payload
→ #1423, #745 (closed) reference MST issues. -
dc_link_handle_hpd_rx_irq
→ #1495 (closed). -
dcn10_get_dig_frontend
→ referenced in #1306 (closed), but not the main issue there. - #1360 (closed) has an interesting loss-of-display on very similar hardware.
Hardware description:
- CPU: 'AMD Ryzen 7 PRO 4750U with Radeon Graphics'
- GPU: 'Renoir'
- System Memory: 32G
- Display(s): 1x AUO B140HAN04.0 (via eDP); 2x Dell P2214H (via DP via USB-C); 1x LG xxx (via HDMI via USB-C)
- Type of Display Connection: eDP, DP/HDMI via USB-C "dock" (Lenovo "ThinkPad USB-C Dock Gen 2", m/t 40AS)
System information:
- Distro name and Version: Arch Linux rolling
- Kernel version: ¬5.11-rc1; 5.10.17+fix-1418, 5.10.16+fix-1418, 5.10.16-arch1, 5.10.14-arch1, 5.10.12-arch1, 5.10.7-arch1, ¬5.10.6, 5.10.5-arch1, 5.10.4-arch1, 5.10.3-arch1, 5.10.2-arch1; 5.10.0; 5.9.14-arch1; ¬5.8.17 (as 5.8.0-33.36 from Ubuntu groovy)
- Custom kernel: 5.11-rc1, 5.10.17+fix-1418, 5.10.16+fix-1418, 5.10.6, 5.10.0 from mainline, with much debugging (KASAN, UBSAN) enabled
- AMD package version: ???
Other pertinent versions: mesa 20.3.1-1 .. 20.3.4-1, amdvlk 2021.Q1.3-1, xf86-video-amdgpu 19.1.0-2, xorg-server 1.20.10-3
How to reproduce the issue:
Plug in dock. Receive oops.
Attached files:
update 2020-12-30: here's some dmesg's collected over 2h of attempts to reproduce the same crash in 5.10.2-arch1.
- dmesg, 5.10.2-arch1 -- dock attach, then dock detach, w/ 2x DP + 1x HDMI: dmesg.lisbon.20201229.215822.txt
- oops in
dccg2_update_dpp_dto
and indcn21_link_encoder_acquire_phy
. (haven't seen this one before!)
- oops in
- dmesg, 5.10.2-arch1 -- dock attach, w/ 2x DP + 1x HDMI: dmesg.lisbon.20201229.215955.txt
- oops in
dc_link_allocate_mst_payload
(fairly common)
- oops in
- dmesg, 5.10.2-arch1 -- hotplugging displays attached to the dock: dmesg.lisbon.20201229.220336.txt
- oops in
drm_dp_mst_atomic_check_mstb_bw_limit
(3x in quick succession)
- oops in
- dmesg, 5.10.2-arch1 -- boot, with dock attached: dmesg.lisbon.20201229.221345.txt
- oops in
mpc2_assert_mpcc_idle_before_connect
(three assertions fail in quick succession) - bonus, with
drm.debug=4095
: dmesg.lisbon.20201229.225434.txt
- oops in
- dmesg, 5.10.2-arch1 -- suspend+resume with dock attached: dmesg.lisbon.20201229.222118.txt
- oops in
dcn10_get_dig_frontend
. (bonus traceback of #1337 (closed), which I also see)
- oops in
- dmesg, 5.10.2-arch1 -- dock attach, drm-killing oops: dmesg.lisbon.20201229.222620.txt, dmesg.lisbon.20201229.225109.txt
- oops in
dm_dp_mst_get_modes
, kernel NULL pointer dereference - bonus, with
drm.debug=4095
: dmesg.lisbon.20201229.231450.txt
- oops in
- dmesg, 5.10.2-arch1 -- dock attach while on vty; then switched to Xorg when apparently stable: dmesg.lisbon.20201229.235125.txt
- oops in
dm_update_crtc_state
, invalid opcode; following warning inrcu_note_context_switch
.
- oops in
observation from this 2h session collecting crashes, after a crash of 5.10.2-arch1, accidentally rebooted into 5.8.0-33.36, then rebooted into 5.10.2-arch1, and that session did not produce the at-boot mpc2_assert_mpcc_idle_before_connect
nor the other oopses on hotplugging, but also only managed to run 3 of 4 CRTCs ... not sure what that implies.
- dmesg 5.10.0 (with KASAN + KUBSAN) -- dock attach, drm-killing oops: dmesg.lisbon.20201230.194240.txt, symbolised
- oops in
dc_link_allocate_mst_payload
and indm_dp_mst_get_modes
- oops in