AMDGPU Crash on Wake
Brief summary of the problem:
When I attempt to wake my system from suspend mode, AMDGPU reports a crash and the system never wakes. The screen stays blank until I hard reboot (hold the power button until the system powers off, wait, power normally). This happens every time with no exceptions, and has been a problem for all of the 5.17 and 5.18 kernels thus far. I don't know the exact kernel where this issue appeared.
I have reported this bug on the Red Hat Bugzilla.
The relevant sections of the log are:
Jul 14 20:36:28 kernel: ------------[ cut here ]------------
Jul 14 20:36:28 kernel: WARNING: CPU: 2 PID: 1878 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:6349 dp_set_panel_mode+0x96/0xa0 [amdgpu]
Jul 14 20:36:28 kernel: Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter iptable_filter qrtr bnep sunrpc binfmt_misc vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd iwlmvm kvm mac80211 irqbypass snd_sof_amd_renoir rapl snd_sof_amd_acp snd_sof_pci libarc4 btusb snd_sof snd_hda_codec_realtek think_lmi btrtl pcspkr firmware_attributes_class snd_hda_codec_generic wmi_bmof snd_sof_utils btbcm uvcvideo joydev snd_hda_codec_hdmi ledtrig_audio btintel videobuf2_vmalloc snd_hda_intel videobuf2_memops
Jul 14 20:36:28 kernel: btmtk iwlwifi snd_soc_core snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_compress videobuf2_common snd_hda_codec iwlmei ac97_bus snd_pcm_dmaengine snd_pci_acp6x k10temp videodev ecdh_generic snd_hda_core snd_pci_acp5x snd_hwdep mc cfg80211 snd_rn_pci_acp3x snd_seq snd_acp_config snd_seq_device snd_soc_acpi snd_pcm snd_pci_acp3x snd_timer mei snd ideapad_laptop i2c_piix4 soundcore sparse_keymap platform_profile rfkill acpi_cpufreq zram amdgpu drm_ttm_helper ttm rtsx_pci_sdmmc mmc_core hid_multitouch iommu_v2 gpu_sched nvme crct10dif_pclmul ucsi_acpi crc32_pclmul crc32c_intel ghash_clmulni_intel wdat_wdt serio_raw nvme_core sp5100_tco drm_dp_helper r8169 typec_ucsi ccp rtsx_pci typec wmi video i2c_hid_acpi i2c_hid ip6_tables ip_tables pkcs8_key_parser fuse
Jul 14 20:36:28 kernel: CPU: 2 PID: 1878 Comm: gnome-shell Tainted: G W 5.18.11-200.fc36.x86_64 #1
Jul 14 20:36:28 kernel: Hardware name: LENOVO 20VF/LNVNB161216, BIOS FACN31WW(V1.14) 02/10/2022
Jul 14 20:36:28 kernel: RIP: 0010:dp_set_panel_mode+0x96/0xa0 [amdgpu]
Jul 14 20:36:28 kernel: Code: 44 0f b6 c5 0f b6 8b 66 02 00 00 8b 53 30 bf 04 00 00 00 48 c7 c6 c0 91 ba c0 e8 15 01 f0 cb 48 83 c4 08 5b 5d e9 aa 42 63 cc <0f> 0b eb d2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 8b 6f 38 48 89
Jul 14 20:36:28 kernel: RSP: 0018:ffffab9a4b92f660 EFLAGS: 00010282
Jul 14 20:36:28 kernel: RAX: 00000000ffffffff RBX: ffff9d886ee0dc00 RCX: ffff9d85c9f3d900
Jul 14 20:36:28 kernel: RDX: ffffffffc04af337 RSI: 0000000000000000 RDI: ffff9d85c0eba0d0
Jul 14 20:36:28 kernel: RBP: ffffab9a4b92f601 R08: 000000000000010a R09: ffffffffc04af3ef
Jul 14 20:36:28 kernel: R10: 0000000000000001 R11: ffffffff8df453e8 R12: ffff9d87178001e8
Jul 14 20:36:28 kernel: R13: ffffffffc0ac2c60 R14: ffff9d8717800448 R15: 0000000000000001
Jul 14 20:36:28 kernel: FS: 00007f36a8ce35c0(0000) GS:ffff9d885de80000(0000) knlGS:0000000000000000
Jul 14 20:36:28 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 14 20:36:28 kernel: CR2: 00007f4b2c8dc1a8 CR3: 0000000199366000 CR4: 0000000000350ee0
Jul 14 20:36:28 kernel: Call Trace:
Jul 14 20:36:28 kernel: <TASK>
Jul 14 20:36:28 kernel: perform_link_training_with_retries+0xfd/0x290 [amdgpu]
Jul 14 20:36:28 kernel: enable_link_dp+0x15d/0x260 [amdgpu]
Jul 14 20:36:28 kernel: core_link_enable_stream+0x89b/0xa60 [amdgpu]
Jul 14 20:36:28 kernel: dce110_apply_ctx_to_hw+0x632/0x6e0 [amdgpu]
Jul 14 20:36:28 kernel: ? dcn10_wait_for_mpcc_disconnect+0x39/0x130 [amdgpu]
Jul 14 20:36:28 kernel: dc_commit_state+0x38c/0xac0 [amdgpu]
Jul 14 20:36:28 kernel: amdgpu_dm_atomic_commit_tail+0x56e/0x26f0 [amdgpu]
Jul 14 20:36:28 kernel: ? dcn21_validate_bandwidth_fp+0xee/0x6a0 [amdgpu]
Jul 14 20:36:28 kernel: ? kfree+0xcc/0x2d0
Jul 14 20:36:28 kernel: ? dcn21_validate_bandwidth_fp+0xee/0x6a0 [amdgpu]
Jul 14 20:36:28 kernel: ? kernel_fpu_end+0x1e/0x40
Jul 14 20:36:28 kernel: ? dc_fpu_end+0x49/0xc0 [amdgpu]
Jul 14 20:36:28 kernel: ? dcn21_validate_bandwidth+0x43/0x50 [amdgpu]
Jul 14 20:36:28 kernel: ? dc_validate_global_state+0x30c/0x3e0 [amdgpu]
Jul 14 20:36:28 kernel: ? dcn20_opp_construct+0x30/0x30 [amdgpu]
Jul 14 20:36:28 kernel: ? hubbub2_get_dcc_compression_cap+0x85/0x250 [amdgpu]
Jul 14 20:36:28 kernel: ? fill_plane_buffer_attributes+0x31e/0x510 [amdgpu]
Jul 14 20:36:28 kernel: ? preempt_count_add+0x6a/0xa0
Jul 14 20:36:28 kernel: ? _raw_spin_lock_irq+0x19/0x40
Jul 14 20:36:28 kernel: ? _raw_spin_unlock_irq+0x1b/0x35
Jul 14 20:36:28 kernel: ? __wait_for_common+0x18d/0x1b0
Jul 14 20:36:28 kernel: ? usleep_range_state+0x70/0x70
Jul 14 20:36:28 kernel: commit_tail+0x94/0x130
Jul 14 20:36:28 kernel: drm_atomic_helper_commit+0x112/0x140
Jul 14 20:36:28 kernel: drm_mode_atomic_ioctl+0x936/0xb30
Jul 14 20:36:28 kernel: ? drm_plane_get_damage_clips.cold+0x1c/0x1c
Jul 14 20:36:28 kernel: ? drm_atomic_set_property+0xb20/0xb20
Jul 14 20:36:28 kernel: drm_ioctl_kernel+0xa1/0x150
Jul 14 20:36:28 kernel: ? drm_ioctl_kernel+0xa1/0x150
Jul 14 20:36:28 kernel: drm_ioctl+0x21f/0x420
Jul 14 20:36:28 kernel: ? drm_atomic_set_property+0xb20/0xb20
Jul 14 20:36:28 kernel: ? perf_trace_hrtimer_init+0xbb/0xd0
Jul 14 20:36:28 kernel: amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
Jul 14 20:36:28 kernel: __x64_sys_ioctl+0x90/0xd0
Jul 14 20:36:28 kernel: do_syscall_64+0x5b/0x80
Jul 14 20:36:28 kernel: ? do_syscall_64+0x67/0x80
Jul 14 20:36:28 kernel: ? __irq_exit_rcu+0x3d/0x140
Jul 14 20:36:28 kernel: ? common_interrupt+0x61/0xd0
Jul 14 20:36:28 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 14 20:36:28 kernel: RIP: 0033:0x7f36ae2f676f
Jul 14 20:36:28 kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Jul 14 20:36:28 kernel: RSP: 002b:00007ffe537c1d30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 14 20:36:28 kernel: RAX: ffffffffffffffda RBX: 0000555dcbd84b60 RCX: 00007f36ae2f676f
Jul 14 20:36:28 kernel: RDX: 00007ffe537c1dd0 RSI: 00000000c03864bc RDI: 0000000000000009
Jul 14 20:36:28 kernel: RBP: 00007ffe537c1dd0 R08: 000000000000000d R09: 000000000000000d
Jul 14 20:36:28 kernel: R10: 0000555dc7a89010 R11: 0000000000000246 R12: 00000000c03864bc
Jul 14 20:36:28 kernel: R13: 0000000000000009 R14: 0000555dc9b50f80 R15: 0000555dc8d91a50
Jul 14 20:36:28 kernel: </TASK>
Jul 14 20:36:28 kernel: ---[ end trace 0000000000000000 ]---
Followed by a lot of lines like this:
Jul 14 20:36:29 kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Hardware description:
Lenovo Thinkbook 14 G2 ARE
- CPU: AMD Ryzen 5 4500U
- GPU: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c3)
- System Memory: 16GB (2x8GB)
- Display(s): Internal laptop display - 14", 1920x1080
- Type of Display Connection: Internal laptop display
System information:
- Distro name and Version: Fedora 36
- Kernel version: 5.18.11-200.fc36.x86_64 #1 (closed) SMP PREEMPT_DYNAMIC Tue Jul 12 22:52:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: NA
How to reproduce the issue:
- Boot system
- Close lid, press power button, or select "suspend" from the Gnome power menu
- Wait for the system to suspend
- Press the power button to wake the system
Log files (for system lockups / game freezes / crashes)
Edited by Mike Banducci