amdgpu fails on suspend/resume
Brief summary of the problem:
Fails to restore from suspend (or possibly fails to suspend).
Hardware description:
- CPU: Ryzen 5700X
- GPU: RX6600
- System Memory: 48GB
- Display(s): Lenovo P27h-20 + Agon AG271qx
- Type of Display Connection: DP
System information:
- Distro name and Version: archlinux latest
- Kernel version: 6.7.9
- Custom kernel: arch
- AMD official driver version: N/A
How to reproduce the issue:
suspend
Log files (for system lockups / game freezes / crashes)
[91934.095844] xhci_hcd 0000:02:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -16
[91934.095850] xhci_hcd 0000:02:00.0: PM: failed to suspend async: error -16
[91935.260097] PM: Some devices failed to suspend, or early wake event detected
[91935.260995] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[91935.261021] [drm] PSP is resuming...
[91935.262324] serial 00:04: activated
[91935.271136] nvme nvme0: 32/0/0 default/read/poll queues
[91935.294626] [drm] reserve 0xa00000 from 0x81fd000000 for PSP TMR
[91935.419497] amdgpu 0000:0c:00.0: amdgpu: RAS: optional ras ta ucode is not available
[91935.440788] amdgpu 0000:0c:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[91935.440791] amdgpu 0000:0c:00.0: amdgpu: SMU is resuming...
[91935.440794] amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b3100 (59.49.0)
[91935.440797] amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
[91935.440846] amdgpu 0000:0c:00.0: amdgpu: use vbios provided pptable
[91935.575545] ata1: SATA link down (SStatus 0 SControl 300)
[91935.575568] ata8: SATA link down (SStatus 0 SControl 300)
[91935.575609] ata7: SATA link down (SStatus 0 SControl 300)
[91935.596483] usb 1-5.4.4: reset high-speed USB device number 5 using xhci_hcd
[91935.732503] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[91935.732522] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[91935.732539] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[91935.732557] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[91935.732609] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[91935.732950] ata5.00: supports DRM functions and may not be fully accessible
[91935.734269] ata4.00: supports DRM functions and may not be fully accessible
[91935.734674] ata6.00: configured for UDMA/133
[91935.734714] ata6.00: Entering active power mode
[91935.735200] ata3.00: configured for UDMA/133
[91935.736930] ata4.00: supports DRM functions and may not be fully accessible
[91935.737389] ata5.00: supports DRM functions and may not be fully accessible
[91935.739308] ata4.00: configured for UDMA/133
[91935.741509] ata5.00: configured for UDMA/133
[91935.741666] ata5.00: Enabling discard_zeroes_data
[91935.798994] ata2.00: configured for UDMA/133
[91940.863320] amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000036 SMN_C2PMSG_82:0x00000000
[91940.863322] amdgpu 0000:0c:00.0: amdgpu: RunDcBtc failed!
[91940.863323] amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw!
[91940.863324] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[91940.863480] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
[91940.863481] amdgpu 0000:0c:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xf0 returns -62
[91940.863486] amdgpu 0000:0c:00.0: PM: failed to resume async: error -62
[91941.228280] usb 5-3.4: reset high-speed USB device number 14 using xhci_hcd
[91941.358977] Restarting tasks ... done.
[91941.361734] random: crng reseeded on system resumption
[91941.361738] PM: suspend exit
[91941.361780] PM: suspend entry (s2idle)
[91941.372488] snd_hda_intel 0000:0c:00.1: Refused to change power state from D0 to D3hot
[91942.151970] Filesystems sync: 0.790 seconds
[91942.152484] Freezing user space processes
[91951.529345] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=128290, emitted seq=128293
[91951.529665] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[91951.529944] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
[91951.530227] amdgpu 0000:0c:00.0: amdgpu: Failed to disallow df cstate
[91951.530232] amdgpu 0000:0c:00.0: [drm] *ERROR* Error queueing DMUB command: status=4
[91951.571815] ------------[ cut here ]------------
[91951.571817] WARNING: CPU: 6 PID: 614562 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:622 amdgpu_irq_put+0x46/0x70 [amdgpu]
[91951.572068] Modules linked in: uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videobuf2_common xt_CHECKSUM xt_tcpudp xt_comment snd_seq_dummy snd_hrtimer snd_seq xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb nls_utf8 vsock cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 dns_resolver fscache overlay netfs ccm algif_aead crypto_null hwmon_vid des3_ede_x86_64 cbc des_generic libdes algif_skcipher cmac md4 algif_hash af_alg r8153_ecm cdc_ether usbnet mousedev joydev nls_iso8859_1 intel_rapl_msr hid_generic vfat r8152 intel_rapl_common mii fat kvm_amd 8021q garp mrp kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek polyval_clmulni polyval_generic
[91951.572147] snd_hda_codec_hdmi snd_hda_codec_generic gf128mul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg sha512_ssse3 sha1_ssse3 snd_usb_audio snd_intel_sdw_acpi aesni_intel snd_usbmidi_lib snd_hda_codec crypto_simd snd_ump cryptd eeepc_wmi snd_hda_core snd_rawmidi asus_wmi snd_hwdep snd_seq_device ledtrig_audio sparse_keymap snd_pcm platform_profile usbhid razerkbd(OE) i8042 snd_timer rapl igb bridge serio ptp ccp mxm_wmi snd stp pps_core sp5100_tco llc soundcore pcspkr wmi_bmof dca k10temp acpi_cpufreq i2c_piix4 cfg80211 gpio_amdpt rfkill gpio_generic mac_hid v4l2loopback(OE) videodev mc pkcs8_key_parser sg crypto_user fuse dm_mod loop nfnetlink ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq crc32c_intel nvme sha256_ssse3 nvme_core xhci_pci xhci_pci_renesas nvme_auth amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec
[91951.572230] CPU: 6 PID: 614562 Comm: kworker/u64:48 Tainted: G OE 6.7.9-arch1-1 #1 ad54415bbff2f0801422a3b76df850f68e71ecab
[91951.572234] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 6042 04/28/2022
[91951.572236] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[91951.572244] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[91951.572491] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 ea 47 5f c3 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 d9 47 5f c3 b8 ea ff ff ff e9 cf 47 5f c3
[91951.572494] RSP: 0018:ffff9d18cb52bc80 EFLAGS: 00010246
[91951.572496] RAX: ffff8b6706dda960 RBX: ffff8b6712080000 RCX: 0000000000000000
[91951.572498] RDX: 0000000000000000 RSI: ffff8b67120a4410 RDI: ffff8b6712080000
[91951.572500] RBP: ffff8b6712080000 R08: 000000000003a5c0 R09: 0000000000000006
[91951.572501] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000001050
[91951.572503] R13: ffff8b67120ba128 R14: ffff8b6ac886c400 R15: 0000000000000000
[91951.572505] FS: 0000000000000000(0000) GS:ffff8b71feb80000(0000) knlGS:0000000000000000
[91951.572507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[91951.572508] CR2: 000073873d90dc70 CR3: 0000000952a20000 CR4: 0000000000f50ef0
[91951.572510] PKRU: 55555554
[91951.572512] Call Trace:
[91951.572515] <TASK>
[91951.572516] ? amdgpu_irq_put+0x46/0x70 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.572759] ? __warn+0x81/0x130
[91951.572765] ? amdgpu_irq_put+0x46/0x70 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.573008] ? report_bug+0x171/0x1a0
[91951.573014] ? handle_bug+0x3c/0x80
[91951.573018] ? exc_invalid_op+0x17/0x70
[91951.573021] ? asm_exc_invalid_op+0x1a/0x20
[91951.573028] ? amdgpu_irq_put+0x46/0x70 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.573269] gfx_v10_0_hw_fini+0x1b/0xf0 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.573518] amdgpu_device_ip_suspend_phase2+0x105/0x1a0 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.573738] ? amdgpu_device_ip_suspend_phase1+0x6f/0xe0 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.573960] amdgpu_device_ip_suspend+0x40/0x70 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.574181] amdgpu_device_pre_asic_reset+0xd3/0x2a0 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.574404] amdgpu_device_gpu_recover+0x476/0xc90 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.574625] ? ___drm_dbg+0x61/0xd0
[91951.574631] amdgpu_job_timedout+0x186/0x270 [amdgpu 164728a6c26992f7aed1675a5d67c737dcdcdbdf]
[91951.574905] drm_sched_job_timedout+0x85/0x120 [gpu_sched fb54e3185d2218cc261be59aab22418ea255661c]
[91951.574914] process_one_work+0x17b/0x350
[91951.574919] worker_thread+0x30f/0x450
[91951.574923] ? __pfx_worker_thread+0x10/0x10
[91951.574925] kthread+0xe8/0x120
[91951.574930] ? __pfx_kthread+0x10/0x10
[91951.574933] ret_from_fork+0x34/0x50
[91951.574937] ? __pfx_kthread+0x10/0x10
[91951.574941] ret_from_fork_asm+0x1b/0x30
[91951.574947] </TASK>
[91951.574949] ---[ end trace 0000000000000000 ]---
Edited by Lewis Diamond