S3-Suspend support lost with 5.11
Brief summary of the problem:
With 5.10 it was possible to properly make use of deep suspend to ram mode S3. With "drm/amdgpu: update amdgpu device suspend/resume sequence for s0i3 support" 628c36d7b238e2d72158e8aba229ec79c69c157e (bisected) that was changed/broken to prefer the more power consuming s0i3 suspend mode. I don't really like that mode as it drains the battery way too fast.
That commit made heavy use of amdgpu_acpi_is_s0ix_supported() which just checks the ACPI table to contain support, but does completely ignore what the system is actually set to use (like 'deep' mem_sleep).
Hardware description:
- Acer Swift 3 SF314-42 Laptop
- CPU: AMD Ryzen 7 4700U with Radeon Graphics
- GPU: Included APU (Advanced Micro Devices, Inc. [AMD/ATI] Renoir (rev c2) (prog-if 00 [VGA controller]))
- System Memory: 16GB LPDDR4
- Display(s): Full-HD Laptop display
- Type of Diplay Connection: eDP
System information:
- Distro name and Version: Arch Linux
- Kernel version: Standard Arch (or even plain mainline/stable) 5.11+ (any version post 628c36d7b238e2d72158e8aba229ec79c69c157e)
- AMD package version: "No package"
How to reproduce the issue:
Use S3 suspend support aka deep mem sleep:
cat /sys/power/mem_sleep
s2idle [deep]
Sleep The system then fails to resume with display distortion/frozen display and the driver trying to reset the gpu due to
Mar 28 15:37:15 archlinux kernel: WARNING: CPU: 1 PID: 578 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7768 amdgpu_dm_atomic_commit_tail+0x2793/0x2cd0 [amdgpu]
Mar 28 15:37:15 archlinux kernel: CPU: 1 PID: 578 Comm: kworker/u32:13 Not tainted 5.10.0-rc3-1 #19
Mar 28 15:37:15 archlinux kernel: Hardware name: Acer Swift SF314-42/Kona_RN, BIOS V1.09 11/18/2020
Mar 28 15:37:15 archlinux kernel: Workqueue: events_unbound async_run_entry_fn
Mar 28 15:37:15 archlinux kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2793/0x2cd0 [amdgpu]
Mar 28 15:37:15 archlinux kernel: commit_tail+0xc5/0x160 [drm_kms_helper]
Mar 28 15:37:15 archlinux kernel: drm_atomic_helper_commit+0x132/0x160 [drm_kms_helper]
Mar 28 15:37:15 archlinux kernel: drm_client_modeset_commit_atomic+0x249/0x290 [drm]
Mar 28 15:37:15 archlinux kernel: drm_client_modeset_commit_locked+0x5b/0x190 [drm]
Mar 28 15:37:15 archlinux kernel: drm_client_modeset_commit+0x24/0x40 [drm]
Mar 28 15:37:15 archlinux kernel: __drm_fb_helper_restore_fbdev_mode_unlocked+0xc2/0xf0 [drm_kms_helper]
Mar 28 15:37:15 archlinux kernel: drm_fb_helper_set_par+0x38/0x70 [drm_kms_helper]
Mar 28 15:37:15 archlinux kernel: drm_fb_helper_hotplug_event.part.0+0xc5/0xe0 [drm_kms_helper]
Mar 28 15:37:15 archlinux kernel: drm_kms_helper_hotplug_event+0x26/0x30 [drm_kms_helper]
Mar 28 15:37:15 archlinux kernel: amdgpu_device_resume+0x1e6/0x4b0 [amdgpu]
and some other lines probably worth noting
kernel: amdgpu: cp queue preemption time out.
[..]
kernel: pci 0000:00:00.2: can't derive routing for PCI INT A
kernel: pci 0000:00:00.2: PCI INT A: no GSI
kernel: [drm] Wait for DMUB auto-load failed: 3
Not quite sure if S3 is intentionally to never be used on those APUs, but that would be a pitty, because technically it works; just the usual software/firmware limitations.
If one wants to allow for S3 support one needs to extend the checks in amdgpu_acpi_is_s0ix_supported() with something in the lines of
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 2e9b16fb3fcd..152f70da0bb7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -906,7 +906,7 @@ bool amdgpu_acpi_is_s0ix_supported(struct amdgpu_device *adev)
#if defined(CONFIG_AMD_PMC) || defined(CONFIG_AMD_PMC_MODULE)
if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0) {
if (adev->flags & AMD_IS_APU)
- return true;
+ return pm_suspend_default_s2idle();
}
#endif
return false;
which worked for my machine, but probably needs some proper ifdefing.
Btw while bisecting with a more debuggish kernel I stumbled over this one
Mar 28 15:36:09 archlinux kernel: UBSAN: shift-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:1140:32
Mar 28 15:36:09 archlinux kernel: shift exponent 64 is too large for 64-bit type 'long long unsigned int'
though not sure if it's a false positive.
Attached files
dmesg_resume_fail.log5.11-amdgpu-reinstate-s3-usability.patch