The system didn't shut off properly with 6.14 kernels
Brief summary of the problem:
I booted the Fedora Rawhide KDE live image Fedora-KDE-Desktop-Live-Rawhide-20250201.n.0.x86_64.iso which had the kernel 6.14.0-0.rc0.20250130git72deda0abee6.11.fc42 on bare metal from a USB flash drive on an hp laptop with an AMD A10-9620P CPU and integrated Radeon R5 GPU. I used the live image. I selected Shut Down from the Application Launcher menu. The system showed the Plymouth spinner screen. I pressed Esc to show the shutdown messages. The normal shutdown messages appeared with the last being something like Unmounting /oldroot. The screen shut off. The laptop's power stayed on instead of shutting off. The fan became progressively louder over the next few minutes. sysrq+alt+b didn't reboot the system. I had to hold down the power button for 5 seconds to shut off the system.
I reproduced the problem a few times with Fedora-KDE-Desktop-Live-Rawhide-20250204.n.0.x86_64.iso which had 6.14.0-0.rc1.15.fc42 and Fedora-KDE-Desktop-Live-Rawhide-20250208.n.0.x86_64.iso which had 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43. The problem also happened when rebooting. I installed 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 in my F41 KDE installation and shut down. The problem happened after systemd-journald stopped so I couldn't see what the problem was in the kernel log. I'm attaching the kernel log with debug added to the kernel command line.
The problem didn't happen with 6.13.1 or earlier. I bisected. The first bad commit involved amdgpu pm.
ff69bba05f085cd6d4277c27ac7600160167b384 is the first bad commit
commit ff69bba05f085cd6d4277c27ac7600160167b384 (HEAD)
Author: Boyuan Zhang <boyuan.zhang@amd.com>
Date: Wed Oct 2 23:52:01 2024 -0400
drm/amd/pm: add inst to dpm_set_powergating_by_smu
Add an instance parameter to amdgpu_dpm_set_powergating_by_smu() function,
and use the instance to call set_powergating_by_smu().
v2: remove duplicated functions.
remove for-loop in amdgpu_dpm_set_powergating_by_smu(), and temporarily
move it to amdgpu_dpm_enable_vcn(), in order to keep the exact same logic
as before, until further separation in next patch.
v3: drop SI logic in amdgpu_dpm_enable_vcn().
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 14 +++++++-------
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 6 +++---
drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 4 ++--
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 4 ++--
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 37 ++++++++++++++++++++++++++-----------
drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 3 ++-
16 files changed, 59 insertions(+), 43 deletions(-)
My GPU is a gfx_v8, and there's a gfx_v8_0.c changed in that patch. When I booted 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 with nomodeset on the kernel command line and the simpledrm kernel driver was used, the system shut down normally. I reported this problem at https://bugzilla.redhat.com/show_bug.cgi?id=2344500 and https://bugzilla.kernel.org/show_bug.cgi?id=219763
Hardware description:
- CPU: AMD A10-9620P
- GPU: Radeon R5. 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] [1002:9874] (rev ca)
- System Memory: 8 GB
- Display(s): Internal Elan touchscreen.
- Type of Display Connection: eDP
System information:
- Distro name and Version: Fedora Rawhide
- Kernel version: 6.14.0-0.rc0.20250130git72deda0abee6.11.fc42 to 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- In a Fedora 41 KDE installation, download Fedora-KDE-Desktop-Live-42-20250208.n.0.x86_64.iso from https://koji.fedoraproject.org/koji/buildinfo?buildID=2653640
- Start Fedora Media Writer
- Write Fedora-KDE-Desktop-Live-42-20250208.n.0.x86_64.iso to a USB flash drive
- Reboot
- Boot Fedora-KDE-Desktop-Live-42-20250208.n.0.x86_64.iso from the USB flash drive on bare metal on a system with an AMD GPU affected by this problem
- Select Shut Down in the Application Launcher menu in Plasma
Attached files:
Log files (for system lockups / game freezes / crashes)
- Dmesg log (full log) 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 kernel log when the system didn't shut off properly 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43-did-not-shut-off-2.txt