AMDGPU crashes randomly/under load
Description
Under load my AMD Radeon RX 6800 GPU occasionally crashes. First my screen will freeze, and then all of my displays will go black. During this time i can hear the game sound repeating over and over. After a moment, the screen turns back on, but the last thing on each display is stuck there, I can move my cursor but I cannot interact with anything by keyboard or mouse. Switching to another TTY I can kill Xorg and restart it. This allows me to use my system but anything graphically intensive will perform poorly and stutter until I perform a system reboot.
This happens at least once a day, and today it happened twice in succession. It happened once, I restarted the system, started my game again, and within 10 minutes it happened again (happened 3 times today).
I cannot find a way to consistently reproduce the issue unfortunately, I tried running a GPU stress test, and when that didn't recreate the issue, I ran it alongside a CPU and memory stress test. That in mind it seems to be a non-hardware/cooling issue, and I believe it must be a crash related to dxvk/mesa or the amdgpu kernel module.
Log files (for system lockups / game freezes / crashes)
Xorg log: https://pastebin.com/Ks9pYA95 Dmesg log: https://pastebin.com/CiBe9MsP
Steps to reproduce
This happens in most games in many situations. Ex: Playing rocket league for an hour or two, playing Call of Duty: Black Ops 3 Zombies (happens randomly as far as I can tell), recently playing Squad has also caused this. I can't find a way to make it occur, but I can say that sometimes I can go more than a day without encountering this issue, but this is rare.
System information
System:
Host: Gentoo Kernel: 5.15.52-gentoo x86_64 bits: 64 compiler: gcc
v: 11.3.0 Desktop: dwm 6.2 dm: startx
Distro: Gentoo Base System release 2.8
CPU:
Info: 12-core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen 2
rev: 0 cache: L1: 768 KiB L2: 6 MiB L3: 64 MiB
Speed (MHz): avg: 3917 high: 4250 min/max: 2200/4672 boost: enabled
cores: 1: 3714 2: 3961 3: 3663 4: 4218 5: 3664 6: 3676 7: 4112 8: 4244
9: 3716 10: 4245 11: 3895 12: 3801 13: 3664 14: 3492 15: 3942 16: 4247
17: 4069 18: 4222 19: 3939 20: 4250 21: 3407 22: 3934 23: 4249
24: 3688 bogomips: 182412
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] driver: amdgpu
v: kernel bus-ID: 0b:00.0 chip-ID: 1002:73bf
Device-2: Logitech HD Pro Webcam C920 type: USB
driver: snd-usb-audio,uvcvideo bus-ID: 9-2:3 chip-ID: 046d:082d
Display: server: X.Org 21.1.4 compositor: picom driver: loaded: amdgpu
resolution: 1: 2560x1440 2: 2560x1080 3: 1920x1080~60Hz s-dpi: 96
OpenGL: renderer: AMD Radeon RX 6800 (sienna_cichlid LLVM 14.0.4 DRM
3.42 5.15.52-gentoo)
v: 4.6 Mesa 22.0.5 direct render: Yes
If applicable
X.Org X Server 1.21.1.4
X Protocol Version 11, Revision 0
Current Operating System: Linux Gentoo 5.15.52-gentoo #7 SMP PREEMPT Tue Jul 26 13:05:38 PDT 2022 x86_64
Kernel command line: root=PARTUUID=cc980e8f-e93c-a748-9417-6b63f93e90a8 rootflags=subvol=/gentoo iommu=pt amd_iommu=on amdgpu.noretry=0 nvme_core.default_ps_max_latency_us=1000 amdgpu.ppfeaturemask=0xffffffff
Current version of pixman: 0.40.0
- Wine/Proton version: Glorious Eggroll Proton 7-21
Regression
I don't know of any mesa or kernel version where this hasn't occurred. I've experienced it since getting the GPU about a year ago.
If there is anything I can provide, please let me know.