Crash when starting steam on RX6600M with linux-next-20240130
When starting steam on my MSI Alpha 15 laptop running debian stable (bookworm) (except for mesa, that's 23.2.1) and linux-next-20240130 the system crashes and reboots, the reboot crashes again and only the second reboot boots normally.
Hardware:
$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller
04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/Renegade NVMe SSD (rev 01)
07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD (rev 03)
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 01)
08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
First error message:
[ 808.537318][ T98] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 808.537330][ T98] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 808.739038][ T98] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 808.739047][ T98] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 809.149601][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 809.149609][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 809.351352][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 809.351366][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 809.553112][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 809.553118][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 809.754859][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 809.754864][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 809.956606][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 809.956610][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 810.165595][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 810.165618][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 810.367363][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 810.367369][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 810.569110][ T3990] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 810.569116][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 810.770910][ T3990] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 818.713654][ T101] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=131, emitted seq=133
[ 818.715144][ T101] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[ 818.716592][ T101] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 818.887429][ T101] __smu_cmn_reg_print_error: 1 callbacks suppressed
[ 818.887441][ T101] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 818.887467][ T101] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Second error message:
[ 12.958277][ T99] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 12.958287][ T99] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 13.160062][ T99] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 13.160068][ T99] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 13.765447][ T1177] NFSD: Unable to initialize client recovery tracking! (-110)
[ 13.765454][ T1177] NFSD: starting 90-second grace period (net f0000000)
[ 14.169120][ T99] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 14.169154][ T99] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 14.976219][ T100] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 14.976233][ T100] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
I'm currently bisecting this (linux-6.8-rc2 is okay).
Edited by Bert Karwatzki