Failure to resume from suspend (s2idle) Ryzen 7 4700U Renoir, bisected regression with 5.16
Brief summary of the problem:
I have problems resuming from suspend with all -rc kernels from v5.16-rc1 to -rc8. When resume fails, the keyboard backlight comes on, but the system is unresponsive. I can usually obtain the system journal on next boot, after doing sysrq: Emergency Sync on the hung system.
Hardware description:
- CPU: AMD Ryzen 7 4700U with Radeon Graphics (Renoir)
- GPU: 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c2)
- System Memory: 16GiB (2 x 8GiB SODIMM DDR4 Synchronous Unbuffered 2667 MHz, A-DATA Technology)
- Display(s): built-in laptop display
- During all tests, nothing is plugged in to system, except the AC adapter.
System information:
- Distro name and Version: Arch Linux
- Kernel version: 5.16-rc8
- Custom kernel: mainline kernel.org
- AMD official driver version: kernel amdgpu
- Linux-firmware: linux-firmware-git 20211229.57d6b95-1
How to reproduce the issue:
Build mainline kernel v5.16-rc1 - v5.16-rc8. Suspend and resume until crash occurs upon resume, usually in the first 1-3 resume attempts.
Resume failure example from system journal:
Jan 04 07:15:47 kernel: PM: suspend entry (s2idle)
Jan 04 07:15:57 kernel: Filesystems sync: 0.019 seconds
Jan 04 07:15:57 kernel: Freezing user space processes ... (elapsed 0.002 seconds) done.
Jan 04 07:15:57 kernel: OOM killer disabled.
Jan 04 07:15:57 kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Jan 04 07:15:57 kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Jan 04 07:15:57 kernel: ACPI: EC: interrupt blocked
Jan 04 07:15:57 kernel: ACPI: EC: interrupt unblocked
Jan 04 07:15:57 kernel: pci 0000:00:00.2: can't derive routing for PCI INT A
Jan 04 07:15:57 kernel: pci 0000:00:00.2: PCI INT A: no GSI
Jan 04 07:15:57 kernel: [drm] PCIE GART of 1024M enabled.
Jan 04 07:15:57 kernel: [drm] PTB located at 0x000000F400900000
Jan 04 07:15:57 kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
Jan 04 07:15:57 kernel: amdgpu 0000:04:00.0: amdgpu: dpm has been disabled
Jan 04 07:15:57 kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
Jan 04 07:15:57 kernel: [drm] DMUB hardware initialized: version=0x0101001C
Jan 04 07:15:57 kernel: nvme nvme0: 8/0/0 default/read/poll queues
Jan 04 07:15:57 kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
Jan 04 07:15:57 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
Jan 04 07:15:57 kernel: amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
Jan 04 07:15:57 kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110
Jan 04 07:15:57 kernel: amdgpu 0000:04:00.0: PM: failed to resume async: error -110
Jan 04 07:15:57 kernel: OOM killer enabled.
Jan 04 07:15:57 kernel: Restarting tasks ... done.
Jan 04 07:15:57 kernel: PM: suspend exit
Bisect
I bisected the kernel starting with good v5.15 / bad v5.16-rc1. Criteria: good: 10-12 successful resumes; bad: crashes upon resume before that, usually on attempts 1-3. Result:
652de07addd2c40684fbf3a91c5b335709a585ca is the first bad commit
commit 652de07addd2c40684fbf3a91c5b335709a585ca
Author: Roman Li <Roman.Li@amd.com>
Date: Fri Oct 15 13:16:31 2021 -0400
drm/amd/display: Fully switch to dmub for all dcn21 asics
[Why]
On renoir usb-c port stops functioning on resume after f/w update.
New dmub firmware caused regression due to conflict with dmcu.
With new dmub f/w dmcu is superseded and should be disabled.
[How]
- Disable dmcu for all dcn21.
Check dmesg for dmub f/w version.
The old firmware (before regression):
[drm] DMUB hardware initialized: version=0x00000001
All other versions require this patch for renoir.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1735
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Attached files:
lshw and journal: lshw_journal.tgz