navi_i2c_write- error, suspend crash and lots of log messages, includes bisect
Brief summary of the problem:
On the latest amd-staging-drm-next ( 6fe4a372821db85834f5522d94d25139ff17e414 ) dmesg is filled with amdgpu errors and system locks up when trying to suspend
Hardware description:
- CPU: AMD Ryzen 9 3900X
- GPU: Navi 14 RX 5500, ASRock Radeon RX 5500 XT Phantom Gaming D 8G OC
- System Memory: 64GB DDR4 3600
- Display(s): LG 27UK850-W 4k@60Hz, Dell P2415Q 4k@60Hz,
- Type of Display Connection: Both DP
System infomration:
- Distro name and Version: Ubuntu 20.04.1
- Kernel version: 5.9.0-rc2 amd-staging-drm-next 6fe4a372821db85834f5522d94d25139ff17e414
- Custom kernel: Kernel from amd-staging-drm-next, commit: "gpu/drm/radeon: fix spelling typo in comments"
- AMD package version: No package
How to reproduce the issue:
- Boot system and check dmesg for amdgpu errors:
[ 25.828810] amdgpu 0000:2f:00.0: amdgpu: failed send message: TransferTableDram2Smu (19) param: 0x0000000a response 0xfffffffb
[ 25.828814] amdgpu 0000:2f:00.0: amdgpu: navi10_i2c_write- error occurred :fffffffb
[ 25.828839] amdgpu 0000:2f:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
[ 25.828841] amdgpu 0000:2f:00.0: amdgpu: navi10_i2c_write- error occurred :fffffffb
- Attempt to suspend system, screen will blank and system will lock up and be unable to wake up. My system normally blinks the power LED during suspend and it doesn't even start blinking, so it's not a wake issue but suspending itself.
I'm unsure of what else from my setup is relevant. My memory passes memtest86, but I also tried with just one stick and in different slots and with XMP disabled. I've tried the GPU in a different PCIe slot.
Bisect
- Bisect lead me to
1bc734759f284eb531dd474c72ce59874649a254 is the first bad commit
commit 1bc734759f284eb531dd474c72ce59874649a254
Author: Alex Deucher <alexander.deucher@amd.com>
Date: Sun Jul 19 13:22:05 2020 -0400
drm/amdgpu/navi1x: add SMU i2c support (v2)
Enable SMU i2c bus access for navi1x asics.
v2: add missing implementation
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 239 +++++++++++++++++++++++++++++
1 file changed, 239 insertions(+)
It wouldn't cleanly revert on amd-staging-drm-next so I just commented out the .i2c_init
and .i2c_fini
lines.
Afterwards, the messages were gone and system suspend works.
I'm happy to provide additional information or test patches.
Attached files:
- Dmesg log i2c.log