[amdgpu] *ERROR* ring sdma0 timeout on boot
Brief summary of the problem:
GPU resets leading to delayed startup times. *ERROR* ring sdma0 timeout in logs.
Hardware description:
- CPU: AMD Ryzen 7 3700X (16) @ 3.600GHz
- GPU: AMD ATI Radeon RX 5700 XT
- System Memory: 32GB
- Display(s): LG 27GL83A-B
- Type of Display Connection: DP
System infomration:
- Distro name and Version: EndeavourOS
- Kernel version: 5.7.12-arch1-1
- AMD package version: mesa 20.1.4-3
How to reproduce the issue:
Issue only occurs at system startup.
One of three things happens when I boot my system: 1) Blank screen/no signal after seeing bootloader, 2) Blank screen with signal, then login screen appears after about 10 sec, 3) Login screen appears immediately after bootloader. I don't know how to get a log from 1), but when 2) occurs I see amdgpu ring errors leading to successful GPU resets accompanied by snd_hda_intel errors and subsequent pulseaudio server startup delays. 3) seems for all intents and purposes to be a successful boot.
Further context: 1) only seems to occur on a cold boot (PC has been powered off for > a couple of hours. Requires a hard reset. After hard reset, there are a number of instances (between 0-4, usually one or two) of 2). These boots can be reset gracefully after logging in. Eventually, an instance of 3) occurs and I can go about my business.
Attached files:
- Dmesg logs
dmesg_example_of_2_delayed_boot_5.7.12-arch1-1.log
dmesg_example_of_3_successful_boot_5.7.12-arch1-1.log
- Journalctl logs
journalctl_example_of_2_delayed_boot_5.7.12-arch1-1.log
journalctl_example_of_3_successful_boot_5.7.12-arch1-1.log
Other:
I don't have any experience in bisecting to determine where a problem first began, but if I install the linux-lts 5.4.55-1 arch packages, I don't have these issues (there are other powerplay errors in the logs, however).