Screen hangs on wakeup on linux >= 5.18
I am running linux >= 5.18 without freesync on a single output using sway with a
0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c4)
I get the following after the screen has been put to sleep with DPMS off and the screen has properly gone to sleep for a few minutes:
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[drm] REG_WAIT timeout 1us * 100000 tries - optc1_wait_for_state line:819
The screen will briefly wake up and then sleep again. Only after about 1-2 minutes the screen would wake up and stay up and I am able to log in again.
The deciding factor in finding whether a commit may have introduced the wait issue seems to be the REG_WAIT timeout
message, as the earlier two perform_link_training_with_retries
also appear in other commits without the other one and they do not exhibit the wait issue!
Similar or related issues seem to be:
Downstream bug report with Arch Linux: https://bugs.archlinux.org/task/75237
I have tried to bisect this issue between v5.17 (problem does not occur) and v5.18-rc1 (problem occurs), which has proven more than difficult as I ran into many other bugs and hard crashes. The bisect points at https://github.com/torvalds/linux/commit/efe17d5a217e6b7dfd16c80dab522abcf2edf1bc as first bad commit, however, due to encountering severe hard crashes it was in fact sometimes not really possible to determine whether any of the above messages would have been emitted. At this point I don't know at all whether any of the built commits verify this issue or not, so I'll make a list with my findings below:
- https://github.com/torvalds/linux/commit/b9132c32e01976686efa26252cc246944a0d2cab (bad)
- https://github.com/torvalds/linux/commit/4f0f1b58fbacc3d4f60e0cf17b01a6273df1d415 (bad)
- https://github.com/torvalds/linux/commit/754e0b0e35608ed5206d6a67a791563c631cec07 (good) already shows:
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
- https://github.com/torvalds/linux/commit/7f161df1a513e2961f4e3c96a8355c8ce93ad175 (bad) crashes otherwise:
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
- https://github.com/torvalds/linux/commit/243c719e872a1322b22efccff80776353357b296 (bad) crashes otherwise:
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
------------[ cut here ]------------
WARNING: CPU: 1 PID: 45579 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3299 dc_set_power_state+0x147/0x150 [amdgpu]
Modules linked in: tun snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq cfg80211 nft_reject_inet nf_reject_ipv4 nf_reject_ip
v6 nft_reject nft_limit nf_log_syslog nft_log nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink lm92 ext4 crc16 mbcache jbd2 uvcvideo vid>
c videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mousedev joydev snd_usb_audio snd_usbmidi_lib mc intel_rapl_msr intel_rapl_common snd_hda_codec_r>
a_codec_generic ledtrig_audio snd_fireface snd_hda_codec_hdmi snd_dice snd_hda_intel edac_mce_amd snd_firewire_lib eeepc_wmi snd_intel_dspcfg asus_wmi snd_in>
kvm_amd sparse_keymap nls_iso8859_1 snd_hda_codec snd_rawmidi platform_profile kvm rfkill vfat snd_hda_core irqbypass snd_hwdep snd_seq_device video mxm_wmi >
rapl snd_pcm sp5100_tco pcspkr snd_timer i2c_piix4 k10temp snd soundcore wmi pinctrl_amd mac_hid acpi_cpufreq pkcs8_key_parser dm_multipath ipmi_devintf ipmi>
Jul 05 17:56:07 hmbx kernel: overlay br_netfilter bridge stp llc sg fuse zram ip_tables x_tables dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm us>
mdgpu crct10dif_pclmul r8169 crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd realtek firewire_ohci ccp cryptd nvme igb mdio_devres xhci_pci firewire>
e nvme_core libphy dca xhci_pci_renesas crc_itu_t drm_ttm_helper ttm gpu_sched btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
CPU: 1 PID: 45579 Comm: kworker/1:0 Not tainted 5.16.0-rc5-1-01505-g243c719e872a-dirty #1 5ab76a27e874a4658c06618b6ba964e556277256
Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3204 01/25/2021
Workqueue: pm pm_runtime_work
RIP: 0010:dc_set_power_state+0x147/0x150 [amdgpu]
Code: 00 00 00 74 1e 48 8b bb 08 fb 00 00 48 8d 93 30 03 00 00 48 89 de 5b 5d 41 5c 41 5d 0f ae e8 ff e0 0f 0b 5b 5d 41 5c 41 5d c3 <0f> 0b e
9 e6 fe ff ff c3 90 0f 1f 44 00 00 80 bf a0 03 00 00 00 74
RSP: 0018:ffffbb40c28f7c80 EFLAGS: 00010202
RAX: ffff99f96fae0000 RBX: ffff99f6847d0000 RCX: ffff99f690f55668
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff99f6847d0000
RBP: 0000000000000004 R08: 0000012584c49986 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff99f691000010
R13: ffff99f681bb20d0 R14: 0000000000000000 R15: ffff99f681bb2248
FS: 0000000000000000(0000) GS:ffff9a056ea40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb9fc1bce40 CR3: 00000003f0480000 CR4: 0000000000350ee0
Call Trace:
<TASK>
dm_suspend+0x8f/0x230 [amdgpu 70d0d647f8e411651e799b75b7a2da3361bb37c0]
? nv_common_set_clockgating_state+0x9c/0xb0 [amdgpu 70d0d647f8e411651e799b75b7a2da3361bb37c0]
amdgpu_device_ip_suspend_phase1+0x63/0xc0 [amdgpu 70d0d647f8e411651e799b75b7a2da3361bb37c0]
amdgpu_device_suspend+0x63/0xe0 [amdgpu 70d0d647f8e411651e799b75b7a2da3361bb37c0]
amdgpu_pmops_runtime_suspend+0xa7/0x170 [amdgpu 70d0d647f8e411651e799b75b7a2da3361bb37c0]
pci_pm_runtime_suspend+0x5c/0x180
? pci_dev_put+0x20/0x20
__rpm_callback+0x44/0x160
? pci_dev_put+0x20/0x20
rpm_callback+0x59/0x70
? pci_dev_put+0x20/0x20
rpm_suspend+0x11a/0x720
? __schedule+0x350/0x10e0
pm_runtime_work+0x94/0xa0
process_one_work+0x1ca/0x390
worker_thread+0x4d/0x3c0
? process_one_work+0x390/0x390
kthread+0x157/0x180
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
---[ end trace cc005beb0d7f8394 ]---
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
- https://github.com/torvalds/linux/commit/a5e7ffa11974d90d36f818ee34fc170722ec3098 (bad) crashes otherwise:
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
- https://github.com/torvalds/linux/commit/69f91d32c6632e09f0954e690d61ac4921dacbd3 (bad) crashes otherwise:
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
- https://github.com/torvalds/linux/commit/bcf19fdd507fb679bb6e1b8a119961f32b6cbb95 (good) crashes otherwise:
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
- https://github.com/torvalds/linux/commit/69f91d32c6632e09f0954e690d61ac4921dacbd3 (bad) crashes otherwise:
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
but crashes randomly after a while:
[drm] free PSP TMR buffer
[drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[drm] PSP is resuming...
[drm] reserve 0x900000 from 0x800f400000 for PSP TMR
amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
amdgpu 0000:0d:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
amdgpu 0000:0d:00.0: amdgpu: Failed to enable requested dpm features!
amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).