6900 XT fails to resume from suspend (bisected)
Brief summary of the problem:
My 6900 XT began to fail to resume correctly from suspend in the 5.14.y series. I didn't get around to tracking it down until lately.
After resuming the observed symptom is that the displays are unresponsive, usually with the monitors repeatedly entering and exiting power save mode. Old images of the desktop and console sometimes appear as well as all white screens and flickering. Also observed are desktop compositor segfaults. The system is otherwise accessible over ssh, but usually with a kworker sitting at 100% of one core. Rebooting over ssh can proceed, but the system does not actually reboot; it looks like it usually gets to where the processes are killed.
Bisection yielded 60b78ed088ebe1a872ee1320b6c5ad6ee2c4bd9a (or 73892cbd7c88b629da1db018e7b3741499ded412 on 5.14.y). Reverting this fixes resume on 5.14.18, 5.15.0, and 5.15.2. Unfortunately I was not able to test 5.16-rc1 because it fails to suspend at all due to a - probably unrelated - DEAD callback error for CPU1
.
My observed behavior appears to be the opposite of what the commit reported to achieve: my sensors and resume only work correctly without it. So perhaps the quirk is applied too broadly?
Hardware description:
- CPU: AMD 5950X
- GPU: AMD reference 6900 XT
0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
- System Memory: 64 GB ECC
- Display(s): 3x LG 27UD68P-B
- Type of Display Connection: 2x DP, 1x USB-C (DP alternate mode)
System information:
- Distro name and Version: ArchLinux
- Kernel version: Problem observed in 5.14.y, 5.15.0, and 5.15.y
How to reproduce the issue:
Suspend the system and resume it: the problem occurs with 100% repeatability. Logs taken by invoking systemctl start suspend.target
over a serial console, then resuming and doing journalctl -k -b
.
Attached files:
Log files (for system lockups / game freezes / crashes)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Owner
Does setting amdgpu.runpm=0 on the kernel command line in grub fix the issue?
Collapse replies - Author
Setting
amdgpu.runpm=0
on the kernel command line does not result in any apparent change in behavior. It still fails to resume on a stock 5.15.0 and works on the reverted 5.15.0-0001
- Mario Limonciello added S3 label
added S3 label
I can't suspend my laptop on kernel 5.16rc1 either.
Edited by Fusion FutureCollapse replies - Owner
Not likely related. Please file your own bug.
I am bisecting. Will file a new bug after bisecting is done.
- Author
I've done some additional testing that resulted in some more datapoints and a viable (but very much a hack) workaround. All traces are from 5.15.0.
SysReq backtraces
A SysReq-l (
echo l > /proc/sysrq-trigger
) while in the "resume failed" state shows several tasks in amdgpu. These look mostly like register pooling loops, so I'm not sure they're really that helpful.Resume failed backtraces
Nov 18 17:05:17 kitsune.inthat.cloud kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3 Nov 18 17:05:17 kitsune.inthat.cloud kernel: sysrq: Show backtrace of all active CPUs Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 13 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CPU: 13 PID: 4786 Comm: bash Not tainted 5.15.0 #29 dcd97a1c107264ae0959544624f304c4403ce341 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 4021 08/09/2021 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Call Trace: Nov 18 17:05:17 kitsune.inthat.cloud kernel: dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) Nov 18 17:05:17 kitsune.inthat.cloud kernel: nmi_cpu_backtrace.cold (lib/nmi_backtrace.c:107) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? lapic_can_unplug_cpu (arch/x86/kernel/apic/hw_nmi.c:33) Nov 18 17:05:17 kitsune.inthat.cloud kernel: nmi_trigger_cpumask_backtrace (lib/nmi_backtrace.c:62) Nov 18 17:05:17 kitsune.inthat.cloud kernel: __handle_sysrq.cold (./include/linux/rcupdate.h:719 drivers/tty/sysrq.c:622) Nov 18 17:05:17 kitsune.inthat.cloud kernel: write_sysrq_trigger (drivers/tty/sysrq.c:1161) Nov 18 17:05:17 kitsune.inthat.cloud kernel: proc_reg_write (./arch/x86/include/asm/atomic.h:165 ./arch/x86/include/asm/atomic.h:178 ./include/linux/atomic/atomic-arch-fallback.h:527 ./include/linux/atomic/atomic-instrumented.h:252 fs/proc/inode.c:213 fs/proc/inode.c:348) Nov 18 17:05:17 kitsune.inthat.cloud kernel: vfs_write (fs/read_write.c:592) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ksys_write (fs/read_write.c:647) Nov 18 17:05:17 kitsune.inthat.cloud kernel: do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:289 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:131 kernel/entry/common.c:302) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? __x64_sys_close (fs/open.c:1330 fs/open.c:1325 fs/open.c:1325) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? do_syscall_64 (arch/x86/entry/common.c:87) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? __x64_sys_fcntl (fs/fcntl.c:472 fs/fcntl.c:457 fs/fcntl.c:457) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:289 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:131 kernel/entry/common.c:302) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? do_syscall_64 (arch/x86/entry/common.c:87) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? do_syscall_64 (arch/x86/entry/common.c:87) Nov 18 17:05:17 kitsune.inthat.cloud kernel: entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) Nov 18 17:05:17 kitsune.inthat.cloud kernel: RIP: 0033:0x7fc86fc8a907 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 All code ======== 0: 0d 00 f7 d8 64 or $0x64d8f700,%eax 5: 89 02 mov %eax,(%rdx) 7: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax e: eb b7 jmp 0xffffffffffffffc7 10: 0f 1f 00 nopl (%rax) 13: f3 0f 1e fa endbr64 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax 1e: 00 1f: 85 c0 test %eax,%eax 21: 75 10 jne 0x33 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 51 ja 0x83 32: c3 ret 33: 48 83 ec 28 sub $0x28,%rsp 37: 48 89 54 24 18 mov %rdx,0x18(%rsp) 3c: 48 rex.W 3d: 89 .byte 0x89 3e: 74 24 je 0x64 Code starting with the faulting instruction 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 51 ja 0x59 8: c3 ret 9: 48 83 ec 28 sub $0x28,%rsp d: 48 89 54 24 18 mov %rdx,0x18(%rsp) 12: 48 rex.W 13: 89 .byte 0x89 14: 74 24 je 0x3a Nov 18 17:05:17 kitsune.inthat.cloud kernel: RSP: 002b:00007ffe17bfe9c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc86fc8a907 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RDX: 0000000000000002 RSI: 0000561077cdbef0 RDI: 0000000000000001 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RBP: 0000561077cdbef0 R08: 000000000000000a R09: 00007fc86fd204e0 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R10: 00007fc86fd203e0 R11: 0000000000000246 R12: 0000000000000002 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R13: 00007fc86fd5c520 R14: 0000000000000002 R15: 00007fc86fd5c700 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Sending NMI from CPU 13 to CPUs 0-12,14-31: Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 2 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CPU: 2 PID: 5318 Comm: ha-report-healt Not tainted 5.15.0 #29 dcd97a1c107264ae0959544624f304c4403ce341 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 4021 08/09/2021 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RIP: 0010:delay_halt_mwaitx (arch/x86/lib/delay.c:142) Nov 18 17:05:17 kitsune.inthat.cloud kernel: Code: 03 05 7b 45 c7 61 31 d2 48 89 d1 0f 01 fa b8 ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <5b> 31 c0 89 c2 89 c1 89 c6 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 All code 0: 03 05 7b 45 c7 61 add 0x61c7457b(%rip),%eax # 0x61c74581 6: 31 d2 xor %edx,%edx 8: 48 89 d1 mov %rdx,%rcx b: 0f 01 fa monitorx %rax,%ecx,%edx e: b8 ff ff ff ff mov $0xffffffff,%eax 13: b9 02 00 00 00 mov $0x2,%ecx 18: 48 39 c6 cmp %rax,%rsi 1b: 48 0f 46 c6 cmovbe %rsi,%rax 1f: 48 89 c3 mov %rax,%rbx 22: b8 f0 00 00 00 mov $0xf0,%eax 27: 0f 01 fb mwaitx %eax,%ecx,%ebx 2a:* 5b pop %rbx <-- trapping instruction 2b: 31 c0 xor %eax,%eax 2d: 89 c2 mov %eax,%edx 2f: 89 c1 mov %eax,%ecx 31: 89 c6 mov %eax,%esi 33: c3 ret 34: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 3b: 00 00 00 00 3f: 66 data16 Code starting with the faulting instruction 0: 5b pop %rbx 1: 31 c0 xor %eax,%eax 3: 89 c2 mov %eax,%edx 5: 89 c1 mov %eax,%ecx 7: 89 c6 mov %eax,%esi 9: c3 ret a: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 11: 00 00 00 00 15: 66 data16 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RSP: 0018:ffffb40780adbb38 EFLAGS: 00000293 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RAX: 00000000000000f0 RBX: 0000000000000d49 RCX: 0000000000000002 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RDX: 0000000000000000 RSI: 0000000000000d49 RDI: 0000005997763bb2 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RBP: 0000000000000d49 R08: 0000000000000000 R09: 0000000000000000 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R13: 0000000000000000 R14: ffff9f4e 9f707bf0 R15: 0000000000000005 Nov 18 17:05:17 kitsune.inthat.cloud kernel: FS: 00007f8d44084740(0000) GS:ffff9f5d2ea80000(0000) knlGS:0000000000000000 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CR2: 000056505cad98f8 CR3: 0000000157876000 CR4: 0000000000750ee0 Nov 18 17:05:17 kitsune.inthat.cloud kernel: PKRU: 55555554 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Call Trace: Nov 18 17:05:17 kitsune.inthat.cloud kernel: delay_halt (./arch/x86/include/asm/msr.h:234 arch/x86/lib/delay.c:164 arch/x86/lib/delay.c:149) Nov 18 17:05:17 kitsune.inthat.cloud kernel: __smu_cmn_poll_stat.isra.0 (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:121) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: smu_cmn_send_smc_msg_with_param.part.0 (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:340) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: smu_cmn_update_table (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:905) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: smu_cmn_get_metrics_table_locked (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:958) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? schedule (./arch/x86/include/asm/preempt.h:85 (discriminator 1) kernel/sched/core.c:6367 (discriminator 1)) Nov 18 17:05:17 kitsune.inthat.cloud kernel: sienna_cichlid_get_smu_metrics_data (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.c:527) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: sienna_cichlid_read_sensor (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu11/sienna_cichlid_ppt.c:1693) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: smu_read_sensor (drivers/gpu/drm/amd/amdgpu/../pm/swsmu/amdgpu_smu.c:2446) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: amdgpu_get_gpu_busy_percent (drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_pm.c:1579) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: dev_attr_show (drivers/base/core.c:2097) Nov 18 17:05:17 kitsune.inthat.cloud kernel: sysfs_kf_seq_show (fs/sysfs/file.c:62) Nov 18 17:05:17 kitsune.inthat.cloud kernel: seq_read_iter (fs/seq_file.c:230) Nov 18 17:05:17 kitsune.inthat.cloud kernel: new_sync_read (fs/read_write.c:405 (discriminator 1)) Nov 18 17:05:17 kitsune.inthat.cloud kernel: vfs_read (fs/read_write.c:485) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ksys_read (fs/read_write.c:623) Nov 18 17:05:17 kitsune.inthat.cloud kernel: do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? __x64_sys_ioctl (fs/ioctl.c:873 fs/ioctl.c:860 fs/ioctl.c:860) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:289 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:131 kernel/entry/common.c:302) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? do_syscall_64 (arch/x86/entry/common.c:87) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? exc_page_fault (./arch/x86/include/asm/paravirt.h:689 arch/x86/mm/fault.c:1493 arch/x86/mm/fault.c:1541) Nov 18 17:05:17 kitsune.inthat.cloud kernel: entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) Nov 18 17:05:17 kitsune.inthat.cloud kernel: RIP: 0033:0x7f8d442e9862 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Code: c0 e9 b2 fe ff ff 50 48 8d 3d 5a 29 0a 00 e8 55 e4 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 All code 0: c0 e9 b2 shr $0xb2,%cl 3: fe (bad) 4: ff (bad) 5: ff 50 48 call 0x48(%rax) 8: 8d 3d 5a 29 0a 00 lea 0xa295a(%rip),%edi # 0xa2968 e: e8 55 e4 01 00 call 0x1e468 13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 18: f3 0f 1e fa endbr64 1c: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax 23: 00 24: 85 c0 test %eax,%eax 26: 75 10 jne 0x38 28: 0f 05 syscall 2a: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 56 ja 0x88 32: c3 ret 33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 38: 48 83 ec 28 sub $0x28,%rsp 3c: 48 rex.W 3d: 89 .byte 0x89 3e: 54 push %rsp 3f: 24 .byte 0x24 Code starting with the faulting instruction 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 56 ja 0x5e 8: c3 ret 9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) e: 48 83 ec 28 sub $0x28,%rsp 12: 48 rex.W 13: 89 .byte 0x89 14: 54 push %rsp 15: 24 .byte 0x24 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RSP: 002b:00007ffc3767dd68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RAX: ffffffffffffffda RBX: 00007f8d440846c8 RCX: 00007f8d442e9862 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RDX: 0000000000001001 RSI: 000056505cad88f0 RDI: 0000000000000007 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RBP: 0000000000001001 R08: 0000000000000000 R09: 00007f8d443baa60 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R10: 00007f8d44725158 R11: 0000000000000246 R12: 00007f8d423b9940 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R13: 000056505cad88f0 R14: 0000000000000007 R15: 000056505c4694c0 Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 18 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.324 msecs Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 25 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 9 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 17 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 19 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 1 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 21 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 22 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 5 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 3 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 7 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 6 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 20 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 23 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 16 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 0 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 4 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 28 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 27 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 8 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CPU: 8 PID: 1638 Comm: gnome-shell Not tainted 5.15.0 #29 dcd97a1c107264ae0959544624f304c4403ce341 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 4021 08/09/2021 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RIP: 0010:delay_halt_mwaitx (arch/x86/lib/delay.c:142) Nov 18 17:05:17 kitsune.inthat.cloud kernel: Code: 03 05 7b 45 c7 61 31 d2 48 89 d1 0f 01 fa b8 ff ff ff ff b9 02 00 00 00 48 39 c6 48 0f 46 c6 48 89 c3 b8 f0 00 00 00 0f 01 fb <5b> 31 c0 89 c2 89 c1 89 c6 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 All code 0: 03 05 7b 45 c7 61 add 0x61c7457b(%rip),%eax # 0x61c74581 6: 31 d2 xor %edx,%edx 8: 48 89 d1 mov %rdx,%rcx b: 0f 01 fa monitorx %rax,%ecx,%edx e: b8 ff ff ff ff mov $0xffffffff,%eax 13: b9 02 00 00 00 mov $0x2,%ecx 18: 48 39 c6 cmp %rax,%rsi 1b: 48 0f 46 c6 cmovbe %rsi,%rax 1f: 48 89 c3 mov %rax,%rbx 22: b8 f0 00 00 00 mov $0xf0,%eax 27: 0f 01 fb mwaitx %eax,%ecx,%ebx 2a:* 5b pop %rbx <-- trapping instruction 2b: 31 c0 xor %eax,%eax 2d: 89 c2 mov %eax,%edx 2f: 89 c1 mov %eax,%ecx 31: 89 c6 mov %eax,%esi 33: c3 ret 34: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 3b: 00 00 00 00 3f: 66 data16 Code starting with the faulting instruction 0: 5b pop %rbx 1: 31 c0 xor %eax,%eax 3: 89 c2 mov %eax,%edx 5: 89 c1 mov %eax,%ecx 7: 89 c6 mov %eax,%esi 9: c3 ret a: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 11: 00 00 00 00 15: 66 data16 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RSP: 0018:ffffb407839831d8 EFLAGS: 00000293 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RAX: 00000000000000f0 RBX: 0000000000000d49 RCX: 0000000000000002 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RDX: 0000000000000000 RSI: 0000000000000d49 RDI: 0000005997765c5e Nov 18 17:05:17 kitsune.inthat.cloud kernel: RBP: 0000000000000d49 R08: ffff9f4e9f700000 R09: 0000000000000000 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000186a0 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R13: ffff9f4e47328a00 R14: ffff9f4e546fa700 R15: ffffb4078398330c Nov 18 17:05:17 kitsune.inthat.cloud kernel: FS: 00007f46b6d96cc0(0000) GS:ffff9f5d2ec00000(0000) knlGS:0000000000000000 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 18 17:05:17 kitsune.inthat.cloud kernel: CR2: 00007fc81e576340 CR3: 000000018ac7c000 CR4: 0000000000750ee0 Nov 18 17:05:17 kitsune.inthat.cloud kernel: PKRU: 55555554 Nov 18 17:05:17 kitsune.inthat.cloud kernel: Call Trace: Nov 18 17:05:17 kitsune.inthat.cloud kernel: delay_halt (./arch/x86/include/asm/msr.h:234 arch/x86/lib/delay.c:164 arch/x86/lib/delay.c:149) Nov 18 17:05:17 kitsune.inthat.cloud kernel: dmub_srv_wait_for_idle (drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_srv.c:663 drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_srv.c:655) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: dc_dmub_srv_cmd_queue (drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:112 drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:80) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: transmitter_control_dmcub_v1_7 (drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table2.c:331) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: transmitter_control_v1_7 (drivers/gpu/drm/amd/amdgpu/../display/dc/bios/command_table2.c:368) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: dcn10_link_encoder_dp_set_lane_settings (drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:1123 (discriminator 2)) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: perform_clock_recovery_sequence (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1110) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? dc_link_dp_perform_link_training (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1709 drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1753) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: dc_link_dp_perform_link_training (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1709 drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1753) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? dm_helpers_dp_write_dpcd (drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.c:489) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? core_link_write_dpcd (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dpcd.c:62 drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dpcd.c:235) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? dp_enable_link_phy (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_hwss.c:136) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? schedule_timeout (kernel/time/timer.c:1887) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? perform_link_training_with_retries (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1839) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: perform_link_training_with_retries (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1839) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: enable_link_dp (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:1766) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: core_link_enable_stream (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2410 drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:3339) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: dce110_apply_ctx_to_hw (drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_hw_sequencer.c:1527 drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_hw_sequencer.c:2240) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: dc_commit_state (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1612 drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:1714) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: amdgpu_dm_atomic_commit_tail (drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9054) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? ttm_bo_mem_compat (drivers/gpu/drm/ttm/ttm_bo.c:947) ttm Nov 18 17:05:17 kitsune.inthat.cloud kernel: commit_tail (drivers/gpu/drm/drm_atomic_helper.c:1669) Nov 18 17:05:17 kitsune.inthat.cloud kernel: drm_atomic_helper_commit (drivers/gpu/drm/drm_atomic_helper.c:1884 drivers/gpu/drm/drm_atomic_helper.c:1817) Nov 18 17:05:17 kitsune.inthat.cloud kernel: drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1460) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? drm_plane_get_damage_clips.cold (drivers/gpu/drm/drm_print.c:158) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? drm_atomic_set_property (drivers/gpu/drm/drm_atomic_uapi.c:1313) Nov 18 17:05:17 kitsune.inthat.cloud kernel: drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:801) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? __check_object_size (mm/usercopy.c:269 mm/usercopy.c:256) Nov 18 17:05:17 kitsune.inthat.cloud kernel: drm_ioctl (drivers/gpu/drm/drm_ioctl.c:899) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? drm_atomic_set_property (drivers/gpu/drm/drm_atomic_uapi.c:1313) Nov 18 17:05:17 kitsune.inthat.cloud kernel: amdgpu_drm_ioctl (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1712) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860) Nov 18 17:05:17 kitsune.inthat.cloud kernel: do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? amdgpu_drm_ioctl (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1716) amdgpu Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? __x64_sys_ioctl (fs/ioctl.c:860) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:289 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:131 kernel/entry/common.c:302) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? do_syscall_64 (arch/x86/entry/common.c:87) Nov 18 17:05:17 kitsune.inthat.cloud kernel: ? exc_page_fault (./arch/x86/include/asm/paravirt.h:689 arch/x86/mm/fault.c:1493 arch/x86/mm/fault.c:1541) Nov 18 17:05:17 kitsune.inthat.cloud kernel: entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) Nov 18 17:05:17 kitsune.inthat.cloud kernel: RIP: 0033:0x7f46bd24a59b Nov 18 17:05:17 kitsune.inthat.cloud kernel: Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48 All code 0: ff (bad) 1: ff (bad) 2: ff 85 c0 79 9b 49 incl 0x499b79c0(%rbp) 8: c7 c4 ff ff ff ff mov $0xffffffff,%esp e: 5b pop %rbx f: 5d pop %rbp 10: 4c 89 e0 mov %r12,%rax 13: 41 5c pop %r12 15: c3 ret 16: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 1d: 00 00 1f: f3 0f 1e fa endbr64 23: b8 10 00 00 00 mov $0x10,%eax 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 01 jae 0x33 32: c3 ret 33: 48 8b 0d a5 a8 0c 00 mov 0xca8a5(%rip),%rcx # 0xca8df 3a: f7 d8 neg %eax 3c: 64 89 01 mov %eax,%fs:(%rcx) 3f: 48 rex.W Code starting with the faulting instruction 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 6: 73 01 jae 0x9 8: c3 ret 9: 48 8b 0d a5 a8 0c 00 mov 0xca8a5(%rip),%rcx # 0xca8b5 10: f7 d8 neg %eax 12: 64 89 01 mov %eax,%fs:(%rcx) 15: 48 rex.W Nov 18 17:05:17 kitsune.inthat.cloud kernel: RSP: 002b:00007ffe04594568 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RAX: ffffffffffffffda RBX: 00007ffe045945b0 RCX: 00007f46bd24a59b Nov 18 17:05:17 kitsune.inthat.cloud kernel: RDX: 00007ffe045945b0 RSI: 00000000c03864bc RDI: 0000000000000009 Nov 18 17:05:17 kitsune.inthat.cloud kernel: RBP: 00000000c03864bc R08: 0000000000000013 R09: 0000000000000013 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R10: 0000562abe338770 R11: 0000000000000246 R12: 0000562abdb6e000 Nov 18 17:05:17 kitsune.inthat.cloud kernel: R13: 0000000000000009 R14: 0000562abf3cc980 R15: 0000562abd2c2de0 Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 11 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 4.656 msecs Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 29 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 10 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 31 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 24 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 4.666 msecs Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 12 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 14 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 15 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 30 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:17 kitsune.inthat.cloud kernel: NMI backtrace for cpu 26 skipped: idling at acpi_idle_do_entry (./arch/x86/include/asm/bitops.h:207 ./include/asm-generic/bitops/instrumented-non-atomic.h:135 drivers/acpi/processor_idle.c:532 drivers/acpi/processor_idle.c:557) Nov 18 17:05:18 kitsune.inthat.cloud kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command! Nov 18 17:05:18 kitsune.inthat.cloud kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to export SMU metrics table!
Stopping the display manager before suspend
Issuing
systemctl stop gdm.service
prior to the suspend doesn't change the overall problem: it still resumes incorrectly. However the system was able to actually reboot in this state. Immediately prior to the actual hardware reboot, the kernel generated a warning trace (looks like an assert hit):Assert hit on reboot
[ 237.030985] ------------[ cut here ]------------ [ 237.035735] WARNING: CPU: 28 PID: 1 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3127 dc_set_power_state (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3127 (discriminator 1)) amdgpu [ 237.048427] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core rfcomm snd_seq_dummy snd_hrtimer snd_seq wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic libchacha libblake2s_generic ip6_udp_tunnel udp_tunnel bridge stp llc nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject nft_counter amdgpu nft_limit nft_ct nf_conntrack cmac algif_hash nf_defrag_ipv6 algif_skcipher nf_defrag_ipv4 af_alg bnep nf_tables nfnetlink uvcvideo videobuf2_vmalloc videobuf2_memops iwlmvm snd_usb_audio videobuf2_v4l2 videobuf2_common snd_usbmidi_lib snd_rawmidi videodev snd_seq_device mac80211 intel_rapl_msr cdc_acm typec_displayport mc snd_hda_codec_hdmi snd_hda_intel intel_rapl_common snd_intel_dspcfg amd64_edac snd_intel_sdw_acpi edac_mce_amd snd_hda_codec libarc4 mousedev joydev kvm_amd iwlwifi snd_hda_core btusb kvm btrtl btbcm snd_hwdep snd_pcm btintel [ 237.048469] gpu_sched snd_timer ucsi_ccg drm_ttm_helper irqbypass cfg80211 bluetooth atlantic rapl snd typec_ucsi ttm mxm_wmi typec wmi_bmof i2c_piix4 k10temp soundcore roles ecdh_generic macsec crc16 rfkill pinctrl_amd mac_hid acpi_cpufreq vfat fat squashfs loop zram nct6775 hwmon_vid jc42 uinput vfio_iommu_type1 vfio sg crypto_user nfsd fuse auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables uas usb_storage btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq dm_crypt cbc encrypted_keys trusted asn1_encoder tee ccp usbhid dm_mod tpm_crb crct10dif_pclmul tpm_tis crc32_pclmul tpm_tis_core crc32c_intel ghash_clmulni_intel tpm wmi aesni_intel crypto_simd cryptd chaoskey rng_core xhci_pci xhci_pci_renesas [ 237.203800] CPU: 28 PID: 1 Comm: shutdown Not tainted 5.15.0 #29 dcd97a1c107264ae0959544624f304c4403ce341 [ 237.213684] Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 4021 08/09/2021 [ 237.223376] RIP: 0010:dc_set_power_state (drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3127 (discriminator 1)) amdgpu [ 237.229528] Code: 5d 41 5c 41 5d e9 24 7e 1d c8 0f 0b 5b 5d 41 5c 41 5d 31 c0 89 c2 89 c6 89 c7 41 89 c0 c3 31 c0 89 c2 89 c6 89 c7 41 89 c0 c3 <0f> 0b e9 d2 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f All code ======== 0: 5d pop %rbp 1: 41 5c pop %r12 3: 41 5d pop %r13 5: e9 24 7e 1d c8 jmp 0xffffffffc81d7e2e a: 0f 0b ud2 c: 5b pop %rbx d: 5d pop %rbp e: 41 5c pop %r12 10: 41 5d pop %r13 12: 31 c0 xor %eax,%eax 14: 89 c2 mov %eax,%edx 16: 89 c6 mov %eax,%esi 18: 89 c7 mov %eax,%edi 1a: 41 89 c0 mov %eax,%r8d 1d: c3 ret 1e: 31 c0 xor %eax,%eax 20: 89 c2 mov %eax,%edx 22: 89 c6 mov %eax,%esi 24: 89 c7 mov %eax,%edi 26: 41 89 c0 mov %eax,%r8d 29: c3 ret 2a:* 0f 0b ud2 <-- trapping instruction 2c: e9 d2 fe ff ff jmp 0xffffffffffffff03 31: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 38: 00 00 00 00 3c: 66 90 xchg %ax,%ax 3e: 0f .byte 0xf 3f: 1f (bad) Code starting with the faulting instruction 0: 0f 0b ud2 2: e9 d2 fe ff ff jmp 0xfffffffffffffed9 7: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) e: 00 00 00 00 12: 66 90 xchg %ax,%ax 14: 0f .byte 0xf 15: 1f (bad) [ 237.248862] RSP: 0018:ffffb0f780077c48 EFLAGS: 00010206 [ 237.254247] RAX: 0000000000000000 RBX: ffff8a2313195ad0 RCX: 0000000000000000 [ 237.261599] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8a2300270000 [ 237.268900] RBP: ffff8a2300270000 R08: ffff8a228a220000 R09: 0000000000000000 [ 237.276207] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a2313180010 [ 237.283558] R13: ffff8a22824f3150 R14: ffffffff8ac52fe0 R15: 00000000fee1dead [ 237.290954] FS: 00007fbb8dee5a40(0000) GS:ffff8a316f100000(0000) knlGS:0000000000000000 [ 237.299237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 237.305161] CR2: 00007fbb8e7a2310 CR3: 00000001356b8000 CR4: 0000000000750ee0 [ 237.312489] PKRU: 55555554 [ 237.315274] Call Trace: [ 237.317774] dm_suspend (drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2017) amdgpu [ 237.325916] ? smuio_v11_0_6_update_rom_clock_gating (drivers/gpu/drm/amd/amdgpu/smuio_v11_0_6.c:46 drivers/gpu/drm/amd/amdgpu/smuio_v11_0_6.c:38) amdgpu [ 237.336628] ? nv_common_set_clockgating_state (drivers/gpu/drm/amd/amdgpu/nv.c:1412) amdgpu [ 237.346772] amdgpu_device_ip_suspend_phase1 (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2872) amdgpu [ 237.356752] amdgpu_device_ip_suspend (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2979) amdgpu [ 237.366106] amdgpu_pci_shutdown (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1395) amdgpu [ 237.374973] pci_device_shutdown (drivers/pci/pci-driver.c:512) [ 237.379199] device_shutdown (./include/linux/device.h:775 drivers/base/core.c:4528) [ 237.383226] __do_sys_reboot.cold (kernel/reboot.c:248 kernel/reboot.c:348) [ 237.387544] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 237.391240] ? exit_to_user_mode_prepare (./include/linux/sched.h:2197 ./include/linux/tracehook.h:201 kernel/entry/common.c:175 kernel/entry/common.c:207) [ 237.396321] ? syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:289 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:131 kernel/entry/common.c:302) [ 237.401257] ? do_syscall_64 (arch/x86/entry/common.c:87) [ 237.405118] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:689 arch/x86/mm/fault.c:1493 arch/x86/mm/fault.c:1541) [ 237.409172] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) [ 237.414385] RIP: 0033:0x7fbb8e846577 [ 237.418047] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c9 98 0c 00 f7 d8 64 89 02 b8 All code 0: 64 89 01 mov %eax,%fs:(%rcx) 3: 48 83 c8 ff or $0xffffffffffffffff,%rax 7: c3 ret 8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) f: 00 00 00 12: 90 nop 13: f3 0f 1e fa endbr64 17: 89 fa mov %edi,%edx 19: be 69 19 12 28 mov $0x28121969,%esi 1e: bf ad de e1 fe mov $0xfee1dead,%edi 23: b8 a9 00 00 00 mov $0xa9,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 01 ja 0x33 32: c3 ret 33: 48 8b 15 c9 98 0c 00 mov 0xc98c9(%rip),%rdx # 0xc9903 3a: f7 d8 neg %eax 3c: 64 89 02 mov %eax,%fs:(%rdx) 3f: b8 .byte 0xb8 Code starting with the faulting instruction 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 01 ja 0x9 8: c3 ret 9: 48 8b 15 c9 98 0c 00 mov 0xc98c9(%rip),%rdx # 0xc98d9 10: f7 d8 neg %eax 12: 64 89 02 mov %eax,%fs:(%rdx) 15: b8 .byte 0xb8 [ 237.437363] RSP: 002b:00007fff58d04388 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 [ 237.445114] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbb8e846577 [ 237.452423] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead [ 237.459792] RBP: 00007fbb8dee58e8 R08: 0000000000000000 R09: 00007fff58d03790 [ 237.467137] R10: 00007fbb8dee58e8 R11: 0000000000000206 R12: 0000000000000000 [ 237.474523] R13: 0000000000000000 R14: 0000000000000004 R15: 000055d2afebaf43 [ 237.481841] ---[ end trace 6e3da0ea2d03e826 ]--- [ 242.960408] amdgpu 0000:0c:00.0: amdgpu: Failed to disable smu features. [ 242.967323] amdgpu 0000:0c:00.0: amdgpu: Fail to disable dpm features! [ 242.974065] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block failed -62 [ 248.410455] amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command! [ 248.418627] amdgpu 0000:0c:00.0: amdgpu: [PrepareMp1] Failed! [ 248.424511] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR SMC failed to set mp1 state 2, -62 [ 248.818442] reboot: Restarting system
Stopping the display manager and unbinding the GPU
Changing the suspend sequence to something like:
systemctl stop gdm.service echo '0000:0c:00.0' > /sys/bus/pci/drivers/amdgpu/unbind systemctl start suspend.target {Resume System} echo '0000:0c:00.0' > /sys/bus/pci/drivers/amdgpu/bind systemctl start gdm.service
Makes the resume work correctly.
Unbinding the XHCI function before suspend
Using a suspend sequence like:
echo '0000:0c:00.2' > /sys/bus/pci/drivers/xhci_hcd/unbind systemctl start suspend.target {Resume System} echo '0000:0c:00.2' > /sys/bus/pci/drivers/xhci_hcd/bind
Also resumes correctly. This yields a viable workaround for me: I've just added scripts to unbind the XHCI function before suspend and rebind it after (via systemd dependencies on sleep.target). So far this hasn't cause any problems, but it's clearly a nasty hack.
It also seems to indicate that some part of the power managment between the GPU and XHCI controller isn't playing nice, but I don't know if I can debug that further.
- Owner
The audio, USB, and UCSI controller endpoints are part of the GPU and they need to be suspended before the GPU because the GPU controls the power to all of them. The GPU also needs to be resumed before the rest of the endpoints. The problematic patch you identified creates that dependency.
I'm seeing something similar to this on an RX 6600 XT after moving from 5.14.x to 5.15.5, though reverting the mentioned commit did not fix the issue. I have a journald log here: https://www.toptal.com/developers/hastebin/ojipopayis.yaml
If this isn't the same problem, I can create a new bug report.
The DEAD Callback on CPU should be fixed: https://github.com/torvalds/linux/commit/4d1cd1443db3d5605ebcde8672869b1944ade92d#diff-4657d16b1f67d71ed32caab4420cd90971296279c049dbb4ab9def55d5d9670e
- Owner
Does this patch help? 0001-drm-amdgpu-handle-BACO-synchronization-with-secondar.patch
- Owner
- Author
Unfortunately, no I still observe the problem with that patch applied on both 5.15.0 and 5.16-rc7.
However, I also have an additional data point: the problem only occurs when a monitor is connected to the USB-C port. If I only have monitors connected to the DP ports, it resumes fine even without that patch. Sorry, I didn't think to test this before.
Collapse replies - Owner
Can you try and disable runtime power management on the USB controller that the USB-C port goes through? E.g.,
sudo bash -c "echo on > /sys/bus/pci/devices/0000\:44\:00.2/power/control"
replace 0000:44:00.2 with the relevant device for your system. Also, if you are using a dGPU, please also check the UCSI device as well (should be something like "Serial bus controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 USB"). E.g.,sudo bash -c "echo on > /sys/bus/pci/devices/0000\:44\:00.3/power/control"
. - Author
Resume works correctly when I disable runtime power management for USB and UCSI with:
echo on > '/sys/bus/pci/devices/0000:0c:00.2/power/control' echo on > '/sys/bus/pci/devices/0000:0c:00.3/power/control'
EDIT: Disabling it for only UCSI has no effect. Just disabling it for USB leaves the system still usable after resume, but with a large number of kernel errors from UCSI that are emitted constantly until reboot:
UCSI Errors
Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1 Dec 31 10:34:01 kitsune.inthat.cloud kernel: PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -22 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: PM: failed to resume: error -22 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-22) Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-22) Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-22) Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-22) Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1 Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-22) Dec 31 10:34:01 kitsune.inthat.cloud kernel: ucsi_ccg 0-0008: possible UCSI driver bug 1
Edited by Derek Hageman
- Owner
Another thing to try. Does setting amdgpu.ppfeaturemask=0xfff73fff (disabling gfxoff), fix the issue?
Collapse replies - Author
Setting
amdgpu.ppfeaturemask=0xfff73fff
on 5.16-rc7 had no effect: resume still failed.However, 5.16.0 (the actual release) resumes correctly. Bisecting for the first commit between 5.16-rc7 and 5.16.0 that failed to resume yielded daf8de0874ab5b74b38a38726fdd3d07ef98a7ee. Reverting that against 5.16.0 makes it fail to resume correctly again: so that commit fixes/works around the issue.
So I think this is fixed, if unintentionally (?). I can continue to test things if needed, but you can also close the issue if that's a sufficient resolution.
- kasimir mentioned in issue #1963 (closed)
mentioned in issue #1963 (closed)
- Developer
This issue hasn't had any activity since 2022-04-03. The AMD driver stack changes rapidly and contains lots of shared code across products so it's possible that it has already been fixed. Please upgrade to a current stable kernel and userspace stack and try again. If you still experience this issue with the latest driver stack, please capture relevant logging and open a new issue referring back to this one.
- Mario Limonciello closed
closed