Artifacting and eventual reboot due to MCE error with VRR enabled at lower/sporadic frame rates on 7900 XT
Brief summary of the problem:
I've recently gotten this card and every few hours when VRR is enabled and the content presented has lower frame rates or varies a lot in frametime, horizontal white line artifacting such as this occurs:
When the screen these appear on is plugged into e.g. DP-2, the stripe will continue over to DP-1 even if this isn't a VRR-enabled monitor.
Over time, these increase and lead to a hard system reboot, which will print MCE errors on the next boot.
[ 1.542146] [Hardware Error]: System Fatal error.
[ 1.542152] [Hardware Error]: CPU:6 (19:21:2) MC5_STATUSC-IUEIMiscVIAddrVIPCCITCCI SyndVI-1-1-]: 0xbea0000001000108
[ 1.542163] [Hardware Error]: Error Addr: 0x0000760ad7e8174d
[ 1.542167] [Hardware Error]: IPID: 0x0005006000000000, Syndrome: 0x0000000041000000
[ 1.542173] [Hardware Error]: Execution Unit Ext. Error Code: 0, Watchdog Timeout error.
[ 1.542178] [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN
The issue can be mitigated by changing the pp_power_profile_mode
to 3D_FULL_SCREEN
. It seems to most happen on the VIDEO
profile, but I've also had it happen and cause a crash on a cutscene in Yakuza 0 in BOOTUP_DEFAULT
. Changing between these profiles hardly changes the clock speeds and voltage, and the issue appears at various clock speeds. Locking the memory clock speed to its maximum via e.g. the power_dpm_force_performance_level
set to high
does not mitigate the issue. Various stress tests and benchmarks (FurMark, MSI Kombustor, Unigine Superposition) do not cause artifacting at any profile.
Disabling VRR or rebooting immediately mitigates the issue.
Things I've tried:
- Memtest86 (5 passes with no errors)
- Trying all the workarounds for Ryzen MCE errors on https://wiki.archlinux.org/title/Ryzen#Random_reboots and https://wiki.gentoo.org/wiki/Ryzen#Random_reboots_with_mce_events (no effect)
- Upgrading motherboard from MSI B450 Tomahawk (PCIe 3.0) to MSI MPG B550 Gaming Plus (PCIe 4.0) (interestingly, this has increased the amount of artifacts that can occur before a reboot)
My previous card, RX Vega 64, is unaffected by this on the same software and hardware.
Hardware description:
- CPU: AMD Ryzen 9 5950X (32) @ 5.0GHz
- GPU: 2d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M] [1002:744c] (rev cc) / PowerColor RX 7900 XT Hellhound
- System Memory: 4x Corsair Vengeance 8 GB DDR4-3000 CMK16GX4M2D3000C16
- Display(s): LG UltraGear 27GN800-B 2560x1440@144Hz (VRR-enabled), Dell S2716DG 2560x1440@144Hz (non-VRR-enabled)
- Type of Display Connection: DP, DP
System information:
- Distro name and Version: Arch Linux
- Kernel version: 6.8.1
- Custom kernel: N/A
- AMD official driver version: N/A
How to reproduce the issue:
- Turn on Adaptive Sync in KDE Plasma settings (either to Auto or Always)
- Watch a 30/60fps YouTube video in fullscreen in Chromium on 144 Hz screen
- Looking at the monitor's OSD reveals very sporadic and usually lower refresh rates
- Artifacting may occur: If it does, it will get gradually worse and cause a reboot at some point
Attached files:
Screenshots/video files
Picture of artifact:
Video of the artifacts and refresh rate on OSD:
GPU stats during artifacts:
Log files (for system lockups / game freezes / crashes)
TODO: Will edit this in, but usually it doesn't seem like there was any new logs saved from the crash.