5.16.20 regression: suspend failure: Failed to enter BACO state!
Brief summary of the problem:
Under kernel 5.16.20, system fails to suspend properly, it doesn't power down and doesn't respond when trying to wake up again. Errors show up in kernel log (visible in "journalctl -b-1" after resetting).
Seems to consistently work in 5.16.19, 5.16.20 consistently fails.
Apr 18 00:17:54 haswell systemd[1]: Reached target Sleep.
Apr 18 00:17:54 haswell systemd[1]: Starting System Suspend...
Apr 18 00:17:54 haswell systemd-sleep[3676]: Entering sleep state 'suspend'...
Apr 18 00:17:54 haswell kernel: PM: suspend entry (deep)
Apr 18 00:17:54 haswell kernel: Filesystems sync: 0.005 seconds
Apr 18 00:17:57 haswell kernel: Freezing user space processes ... (elapsed 0.003 seconds) done.
Apr 18 00:17:57 haswell kernel: OOM killer disabled.
Apr 18 00:17:57 haswell kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Apr 18 00:17:57 haswell kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Apr 18 00:17:57 haswell kernel: serial 00:01: disabled
Apr 18 00:17:57 haswell kernel: sd 5:0:0:0: [sda] Synchronizing SCSI cache
Apr 18 00:17:57 haswell kernel: sd 6:0:0:0: [sdb] Synchronizing SCSI cache
Apr 18 00:17:57 haswell kernel: sd 5:0:0:0: [sda] Stopping disk
Apr 18 00:17:57 haswell kernel: sd 6:0:0:0: [sdb] Stopping disk
Apr 18 00:17:57 haswell kernel: [drm] free PSP TMR buffer
Apr 18 00:17:57 haswell kernel: amdgpu 0000:03:00.0: amdgpu: BACO reset
Apr 18 00:17:57 haswell kernel: amdgpu 0000:03:00.0: amdgpu: Failed to enter BACO state!
Apr 18 00:17:57 haswell kernel: PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x70 [amdgpu] returns -5
Apr 18 00:17:57 haswell kernel: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
Apr 18 00:17:57 haswell kernel: amdgpu 0000:03:00.0: PM: failed to suspend async: error -5
Apr 18 00:17:57 haswell kernel: PM: Some devices failed to suspend, or early wake event detected
Apr 18 00:17:57 haswell kernel: sd 5:0:0:0: [sda] Starting disk
Apr 18 00:17:57 haswell kernel: sd 6:0:0:0: [sdb] Starting disk
Apr 18 00:17:57 haswell kernel: serial 00:01: activated
Apr 18 00:17:57 haswell kernel: nvme nvme0: Shutdown timeout set to 8 seconds
Apr 18 00:17:57 haswell kernel: nvme nvme0: 16/0/0 default/read/poll queues
Apr 18 00:17:57 haswell kernel: nvme nvme1: 16/0/0 default/read/poll queues
Apr 18 00:17:57 haswell kernel: ata8: SATA link down (SStatus 4 SControl 300)
Apr 18 00:17:57 haswell kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 18 00:17:57 haswell kernel: ata5.00: configured for UDMA/100
Apr 18 00:17:57 haswell kernel: usb 1-11: reset high-speed USB device number 4 using xhci_hcd
Apr 18 00:17:57 haswell kernel: usb 1-10.3: reset full-speed USB device number 7 using xhci_hcd
Apr 18 00:17:57 haswell kernel: PM: resume devices took 2.358 seconds
Apr 18 00:17:57 haswell kernel: OOM killer enabled.
Apr 18 00:17:57 haswell kernel: Restarting tasks ... done.
Apr 18 00:17:57 haswell kernel: PM: suspend exit
Apr 18 00:17:57 haswell kernel: PM: suspend entry (s2idle)
Looking at the changelog for 5.16.20, this seems suspicious:
commit b27cfc34a4e6966b9521c86a7e78cf212861955a
Author: Alex Deucher <alexander.deucher@amd.com>
Date: Fri Mar 25 11:53:39 2022 -0400
drm/amdgpu: don't use BACO for reset in S3
commit ebc002e3ee78409c42156e62e4e27ad1d09c5a75 upstream.
Seems to cause a reboots or hangs on some systems.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I guess this was supposed to fix something, but seems to break something here, at least in the backport to 5.16?
Hardware description:
- CPU: 12th Gen Intel(R) Core(TM) i5-12600K
- GPU: Radeon RX 5500 XT
- System Memory: 32 GB
- Display(s): Dell U3219Q
- Type of Display Connection: DP
System information:
- Distro name and Version: Fedora 35
- Kernel version: 5.16.20-200.fc35.x86_64
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
Boot with this kernel Suspend machine
Attached files:
Log files (for system lockups / game freezes / crashes)
- Dmesg log (full log) journal.txt