RX 6700 XT screen freezes, followed eventually by a soft recovery
Brief summary of the problem:
I was running a Wayfire session on my RX 6700 XT, and not doing anything particularly GPU strenuous, just running Vesktop (Electron, GPU accelerated Discord web client) and then the desktop froze for a minute. I was able to log in via SSH to see that a page fault happened. It eventually soft recovered after collecting a core dump.
Hardware description:
- CPU: Ryzen 7 7700X
- GPU: 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] [1002:73df] (rev c5)
- System Memory: 64 GiB
- Display(s): ASUS VG247Q1A, Dell P2414H
- Type of Display Connection: DP, DP
System information:
- Distro name and Version: Arch Linux
- Kernel version: Linux copycat 6.12.0-rc3-1-cachyos-rc #1 (closed) SMP PREEMPT_DYNAMIC Tue, 15 Oct 2024 00:39:53 +0000 x86_64 GNU/Linux
- Custom kernel: linux-cachyos-rc built with PKGBUILD with default settings
- AMD official driver version: N/A
How to reproduce the issue:
I can't reproduce it because it happened almost completely at random. Though it may also have something to do with my Cooler Master PCIe 4.0 riser. I may eliminate that if it keeps happening to see if it's the cause. I'm receiving a 7700 XT in less than 24 hours, so I probably won't be testing much more on the 6700 XT after that. It may happen again, depending.
Log files (for system lockups / game freezes / crashes)
- Dmesg log (full log)
- Xorg log
- N/A
- Any other log
- core dump captured from devcoredump which dumped before the reset
Additional information
I'm not sure if this is a DRM bug or a Mesa bug. The ring that crashed this time was the compute ring, and I don't think I was doing any compute work at the time, unless that was interacting with the Discord video call I was running at the time, which was using hardware H.264 encode for the outgoing portions of the call. (camera, desktop capture)
Edit: wayfire-plugins-extra supplies Scott Moreau's pixdecor decorator plugin for server side decoration, which he did tell me uses compute for something or other. It may be wise to get him into this topic to explain better.
Edit 2: Compute seems to have been affected by using Mesa git, which I was running at commit 67aadd4f0b8. Switching back to Mesa stable 24.2.5, there is more GFX usage, but no compute usage.