6900 XT fails to resume from suspend (bisected)
Brief summary of the problem:
My 6900 XT began to fail to resume correctly from suspend in the 5.14.y series. I didn't get around to tracking it down until lately.
After resuming the observed symptom is that the displays are unresponsive, usually with the monitors repeatedly entering and exiting power save mode. Old images of the desktop and console sometimes appear as well as all white screens and flickering. Also observed are desktop compositor segfaults. The system is otherwise accessible over ssh, but usually with a kworker sitting at 100% of one core. Rebooting over ssh can proceed, but the system does not actually reboot; it looks like it usually gets to where the processes are killed.
Bisection yielded 60b78ed088ebe1a872ee1320b6c5ad6ee2c4bd9a (or 73892cbd7c88b629da1db018e7b3741499ded412 on 5.14.y). Reverting this fixes resume on 5.14.18, 5.15.0, and 5.15.2. Unfortunately I was not able to test 5.16-rc1 because it fails to suspend at all due to a - probably unrelated - DEAD callback error for CPU1
.
My observed behavior appears to be the opposite of what the commit reported to achieve: my sensors and resume only work correctly without it. So perhaps the quirk is applied too broadly?
Hardware description:
- CPU: AMD 5950X
- GPU: AMD reference 6900 XT
0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
- System Memory: 64 GB ECC
- Display(s): 3x LG 27UD68P-B
- Type of Display Connection: 2x DP, 1x USB-C (DP alternate mode)
System information:
- Distro name and Version: ArchLinux
- Kernel version: Problem observed in 5.14.y, 5.15.0, and 5.15.y
How to reproduce the issue:
Suspend the system and resume it: the problem occurs with 100% repeatability. Logs taken by invoking systemctl start suspend.target
over a serial console, then resuming and doing journalctl -k -b
.