GPU crash when running in with an external display on 4K
Brief summary of the problem:
At random times, roughly once in a week, my system locks up, screen turns blank/text mode and does not get back without rebooting the system.
Hardware description:
Lenovo Thinkpad T14 Gen 1
- CPU:
AMD Ryzen 7 PRO 4750U with Radeon Graphics
(cpu family: 23, model: 96, stepping: 1) - GPU:
07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev d1)
- System Memory: 32GiB
- Display(s): Lenovo LEN P27u-10
- Type of Display Connection: DP-1 (USB-C alt mode)
System information:
- Distro name and Version: Arch Linux
- Kernel version:
Linux machine 6.3.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 11 May 2023 16:40:42 +0000 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
No exact steps to reproduce. The issue manifests while running on external display, mostly on 4K resolution (with FullHD, the frequency was much lower) and with two different user sessions running Wayland (often one 4K and one FullHD, flipping between few times a day). Roughly once in a week the system crashes -- screen turns blank/text mode cursor shows and the only remedy is to reboot the machine. After reboot, the system logs show kernel errors/warnings at the time when the crash happened.
I could not find any correlation with switching two user sessions, or running any more heavy graphics load -- things happen even when editing a text file. I noticed few other issues that might or might not be related:
- When switching user sessions (ctrl-alt-Fx), I sometimes get a blank screen (monitor turns into power saving mode0; switching back and forth between the session helps.
- When running multiple browser (Google Chrome) windows, one of the windows starts locking up quite heavily (freeze for a significant number of seconds when scrolling or even entering text). Problem disappears once there's only one browser window running again.
- there's a kernel warning when resuming the computer from sleep that happens in amdgpu_irq.c; but this does not seem to correlate with a follow-up crash.
If there are any suggesions, I'll be happy to experiment.
Attached files:
Log files (for system lockups / game freezes / crashes)
- journalctl -xb | grep kernel: dmesg.txt -- this is a log from a few suspend/resume cycles until I had to reboot the system due to a lock-up (at the end)