7900xtx amdgpu: [gfxhub] page fault - GCVM_L2_PROTECTION_FAULT_STATUS - complete system freezes
Brief summary of the problem:
I've been having this issue for months now. For me, they started to happen mid September, and now across three different amd 7000 series gpu's. two 7900xt's and now a 7900xtx. From July to early September I had no issues with my original 7900xt. After getting this issue, unsure if it was hardware or not, I replaced it with another, new, 7900xt. Same issue. Frustrated, I decided to buy a 7900xtx two weeks ago, and unfortunately, same issue. AMDGPU crashes that lead to complete, system freezes. During this entire process, not only have I gone through three different GPU's, but I also pretty much rebuilt this computer. I changed the CPU, motherboard, ram, power supply, and even sound cards. I reinstalled arch too many times to count. I always switched from using Gnome on Wayland to KDE on X11. I really do not believe this is a hardware issue. Considering, from my testing on Windows 11, things have worked great.
Hardware description:
- CPU: 13th Gen Intel(R) Core(TM) i9-13900K
- GPU: 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8)
- System Memory: 64GB DDR5 6000MHZ
- Display(s): Samsung Electric Company LS32CG51x
- Type of Display Connection: DP
System information:
- Distro name and Version: Arch Linux
- Kernel version: Linux linux64 6.6.7-arch1-1
- AMD official driver version: N/A
How to reproduce the issue:
Simply playing a game. No particular game. Games that I have played that have caused these crashes have been Warcraft 3 Reforged, World of Warcraft Classic, Starfield, Cyberpunk 2077, Satisfactory, and Atomic Heart. Which have been the only games I have played. Sometimes these crashes happen within 30 minutes of play time, to hours later. The recent crash, today, took 7 hours of playing World of Warcraft Classic nonstop for it to happen.
The crash comes in two forms which I separate as "hard crash" and "soft crash." The hard crash is an instant crash. The entire screen freezes, typically, but not always, the screen will flash black for a split second, and come back to a frozen image. Other times, its just solid black screen. The mouse and keyboard become unresponsive. Even if i unplug and plug them back in, they don't work. I press the caps lock key, its led light on the keyboard doesn't even turn on. This time, I actually did have audio playing in the background, a YouTube video, and it was actually still playing. This was the first time I had a crash when I was playing audio in the background. So its not a complete system freeze, but its hard enough where I lose mouse and keyboard functionality along with the screen itself.
The soft crash is when the game does freeze, but I can actually alt tab out and force close the game. But, after X amount of time, which I have linked to interface clicks, the system will completely freeze up just like the hard crash. By interface clicks, I mean, just closing a window, opening another window, or even clicking the power menu - shutdown, after 2 - 5 interface clicks, the entire computer just freezes. These types are not as common as the hard crash ones.
Log files (for system lockups / game freezes / crashes)
Journalctl log of the latest crash: https://pastebin.com/ijS2UNm8