Disconnecting a DisplayPort monitor while the system is suspended leads to a system hang (RX 6700XT)
Brief summary of the problem:
System hangs when resuming from suspend, when a monitor (DP/MST) is disconnected while the system was suspended. I run into this issue very frequently, because I often switch my display inputs when changing between my desktop PC and my work laptop.
When both of my displays (a Dell U2715H connected over displayport, and an old Samsung connected over HDMI) are connected, the entire system hangs while resuming if the Dell monitor is disconnected. In this state, the screens are black or don't wake up, SSH does not work, SysRq-REISUB sometimes works, and there's never any logs in the journal from after waking up the system (logs end with "suspend entry (deep)").
When only the DP display is connected to the GPU, the hang does not seem to happen, even though some warnings are still logged in dmesg.
When I connect the HDMI display to the Intel integrated graphics, the system hang does not immediately occur. After resuming, I can use the system as long as I don't try to do anything with the AMD GPU. There's this kernel warning in the logs in this situation:
helmi 03 23:59:45 kernel: ------------[ cut here ]------------
helmi 03 23:59:45 kernel: WARNING: CPU: 2 PID: 1377 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.c:188 fill_dc_mst_payload_table_from_drm+0x94/0x140 [amdgpu]
helmi 03 23:59:45 kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables br_netfilter bridge cfg80211 8021q garp mrp stp nct6775 llc nct6775_core hwmon_vid overlay intel_rapl_msr intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_avs hid_logitech_hidpp kvm_intel snd_soc_hda_codec snd_hda_ext_core snd_hda_codec_realtek snd_soc_core snd_compress snd_hda_codec_generic ac97_bus kvm snd_hda_codec_hdmi snd_pcm_dmaengine irqbypass joydev mousedev crct10dif_pclmul eeepc_wmi crc32_pclmul snd_hda_intel nls_iso8859_1 asus_wmi polyval_clmulni hid_logitech_dj snd_usb_audio snd_intel_dspcfg ledtrig_audio vfat polyval_generic amdgpu fat sparse_keymap gf128mul platform_profile ghash_clmulni_intel snd_usbmidi_lib snd_intel_sdw_acpi i8042 sha512_ssse3 snd_ump serio sha256_ssse3 i915 iTCO_wdt snd_hda_codec snd_rawmidi mei_pxp sha1_ssse3 intel_pmc_bxt mei_hdcp aesni_intel snd_hda_core iTCO_vendor_support snd_seq_device ee1004 ppdev amdxcp
helmi 03 23:59:45 kernel: drm_exec mc wmi_bmof snd_hwdep crypto_simd rfkill mxm_wmi cryptd snd_pcm gpu_sched drm_suballoc_helper drm_buddy i2c_algo_bit rapl drm_ttm_helper usbhid ttm cp210x snd_timer intel_cstate drm_display_helper snd mei_me intel_uncore cec e1000e soundcore i2c_i801 pcspkr intel_gtt i2c_smbus mei parport_pc parport video wmi acpi_pad mac_hid vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) ip6table_nat ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c i2c_dev sg fuse crypto_user loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme nvme_core crc32c_intel nvme_auth xhci_pci xhci_pci_renesas
helmi 03 23:59:45 kernel: CPU: 2 PID: 1377 Comm: kworker/u8:30 Tainted: G OE 6.7.3-arch1-1 #1 b8291227ebee24f0bec9b3471a94151938512264
helmi 03 23:59:45 kernel: Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018
helmi 03 23:59:45 kernel: Workqueue: drm_dp_mst_wq drm_dp_delayed_destroy_work [drm_display_helper]
helmi 03 23:59:45 kernel: RIP: 0010:fill_dc_mst_payload_table_from_drm+0x94/0x140 [amdgpu]
helmi 03 23:59:45 kernel: Code: 09 31 d2 48 89 c1 eb 0b 83 c2 01 48 83 c1 18 39 d6 74 17 40 38 39 75 f0 48 63 ca 31 ff 48 8d 0c 49 66 89 7c cc 28 39 d6 75 22 <0f> 0b eb 1e 0f b7 5a 0c 0f b7 05 5f 7d 7e 00 48 8d 0c 76 8a 42 09
helmi 03 23:59:45 kernel: RSP: 0000:ffffa1b9880274f0 EFLAGS: 00010246
helmi 03 23:59:45 kernel: RAX: ffffa1b988027518 RBX: 0000000000000000 RCX: 0000000000000000
helmi 03 23:59:45 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa1b988027598
helmi 03 23:59:45 kernel: RBP: ffff901a5e88c000 R08: ffffa1b98802769c R09: 0000000000000000
helmi 03 23:59:45 kernel: R10: 00000000000000ff R11: ffff901a5ee59800 R12: ffffa1b98802769c
helmi 03 23:59:45 kernel: R13: ffffa1b9880275c0 R14: ffff901a42afe7e0 R15: ffff901a5e898000
helmi 03 23:59:45 kernel: FS: 0000000000000000(0000) GS:ffff901d76d00000(0000) knlGS:0000000000000000
helmi 03 23:59:45 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
helmi 03 23:59:45 kernel: CR2: 0000000000000000 CR3: 000000035c820001 CR4: 00000000003706f0
helmi 03 23:59:45 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
helmi 03 23:59:45 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
helmi 03 23:59:45 kernel: Call Trace:
helmi 03 23:59:45 kernel: <TASK>
helmi 03 23:59:45 kernel: ? fill_dc_mst_payload_table_from_drm+0x94/0x140 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: ? __warn+0x81/0x130
helmi 03 23:59:45 kernel: ? fill_dc_mst_payload_table_from_drm+0x94/0x140 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: ? report_bug+0x171/0x1a0
helmi 03 23:59:45 kernel: ? handle_bug+0x3c/0x80
helmi 03 23:59:45 kernel: ? exc_invalid_op+0x17/0x70
helmi 03 23:59:45 kernel: ? asm_exc_invalid_op+0x1a/0x20
helmi 03 23:59:45 kernel: ? fill_dc_mst_payload_table_from_drm+0x94/0x140 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: dm_helpers_dp_mst_write_payload_allocation_table+0xc0/0x110 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: link_set_dpms_off+0x730/0x9a0 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: dcn20_reset_hw_ctx_wrap+0x164/0x450 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: dce110_apply_ctx_to_hw+0x6b/0x710 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: ? __free_pages_ok+0x196/0x460
helmi 03 23:59:45 kernel: dc_commit_state_no_check+0x3a5/0xe50 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: dc_commit_streams+0x2a9/0x420 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: amdgpu_dm_atomic_commit_tail+0x39a/0x3a90 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: ? dcn30_validate_bandwidth+0x101/0x2c0 [amdgpu 3f66c94e9c076d4a73edf261cf98db7721227fae]
helmi 03 23:59:45 kernel: ? __kmem_cache_alloc_node+0x1a0/0x2e0
helmi 03 23:59:45 kernel: ? wait_for_completion_timeout+0x13e/0x170
helmi 03 23:59:45 kernel: ? wait_for_completion_interruptible+0x139/0x1e0
helmi 03 23:59:45 kernel: ? drm_dp_mst_atomic_setup_commit+0x90/0x1a0 [drm_display_helper fc4972a3f0e3049564eb2874ef4e540b9c551237]
helmi 03 23:59:45 kernel: commit_tail+0x91/0x130
helmi 03 23:59:45 kernel: drm_atomic_helper_commit+0x11a/0x140
helmi 03 23:59:45 kernel: drm_atomic_commit+0x97/0xd0
helmi 03 23:59:45 kernel: ? __pfx___drm_printfn_info+0x10/0x10
helmi 03 23:59:45 kernel: drm_client_modeset_commit_atomic+0x203/0x250
helmi 03 23:59:45 kernel: drm_client_modeset_commit_locked+0x5a/0x160
helmi 03 23:59:45 kernel: drm_client_modeset_commit+0x25/0x40
helmi 03 23:59:45 kernel: __drm_fb_helper_restore_fbdev_mode_unlocked+0x85/0xd0
helmi 03 23:59:45 kernel: drm_fb_helper_hotplug_event+0xe5/0x100
helmi 03 23:59:45 kernel: drm_client_dev_hotplug+0x9e/0xf0
helmi 03 23:59:45 kernel: process_one_work+0x171/0x340
helmi 03 23:59:45 kernel: worker_thread+0x27b/0x3a0
helmi 03 23:59:45 kernel: ? __pfx_worker_thread+0x10/0x10
helmi 03 23:59:45 kernel: kthread+0xe5/0x120
helmi 03 23:59:45 kernel: ? __pfx_kthread+0x10/0x10
helmi 03 23:59:45 kernel: ret_from_fork+0x31/0x50
helmi 03 23:59:45 kernel: ? __pfx_kthread+0x10/0x10
helmi 03 23:59:45 kernel: ret_from_fork_asm+0x1b/0x30
helmi 03 23:59:45 kernel: </TASK>
helmi 03 23:59:45 kernel: ---[ end trace 0000000000000000 ]---
The same warning appears when there's only one monitor connected, and that is disconnected while the system is suspended. But in that situation the system does not hang at least immediately.
I've reproduced the hang on kernel versions 6.7.3, 6.6.15, 6.1.77 and 5.15.94, so it's not a particularly new issue.
Hardware description:
- CPU: Intel i5 6600K
- GPU: RX 6700 XT (reference card)
- System Memory: 16GB
- Display(s): Dell U2715H, Samsung LS24E65UDW
- Type of Display Connection: Dell connected via DisplayPort, Samsung via HDMI
System information:
- Distro name and Version: Arch linux
- Kernel version: 6.7.3-arch1-1, 6.6.15-1-lts
- AMD official driver version: n/a
How to reproduce the issue:
Reproduce the hang
- Connect one monitor (MST capable) with DisplayPort, another with HDMI
- Boot the system
- Close X11 / Wayland (e.g.
systemctl stop sddm
in my case)- Both displays show the console now
- Suspend the system with
systemctl suspend
- Turn off the Displayport monitor connected to the AMD GPU
- Wake the system up
- Observe frozen system
- HDMI monitor shows either a black or white screen or the console from before suspension. This behaviour differs between kernel versions
- SSH does not work, there are no logs written to the journal...
Reproduce issue without immediate hang
- Connect one monitor to AMD GPU with displayport, another to Intel iGPU with HDMI
- Boot the system
- Close X11 / Wayland (e.g.
systemctl stop sddm
in my case)- In my setup, the display connected to the AMD GPU is now black, while the Linux console is on the HDMI monitor
- Suspend the system with
systemctl suspend
- Turn off the Displayport monitor connected to the AMD GPU
- Wake the system up
- Observe kernel warning in
dmesg
- Switch the Displayport display back on
- Try to reset the GPU with
cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
, see the command hang