7900 XTX flickers on kernels >= 6.5
Brief summary of the problem:
I was happily using kernel version 6.4.12 on Arch and at some point 6.5.3 hit the Arch repository. Almost immediately (5-10 minutes) after upgrading & rebooting, I started noticing flickers on my second monitor (the smaller one, 1440p). Unfortunately providing an image or a video of the problem is not really possible because I can't reproduce the problem on-demand and when it happens, it's visible for a very short period of time (fraction of a second). But when it does happen, I can see flickering on my right monitor where numerous horizontal lines (from start to finish) are white-ish (I think?), as if the pixels for said lines are missing. The problem happens every now and then, with no obvious usage pattern. Most of the time the secondary monitor is idle & the primary monitor is used to do tasks such as:
- scrolling a wiki page or a gitlab issue
- writing code in a dark-themed IDE; I was able to reproduce the problem a few times by switching to a white page (e.g. Gitlab issue), but it's not a reliable reproducer
- watching a YouTube video / moving in VLC
- playing a game (which is GPU-intensive)
- having a terminal open on the right monitor, compiling code & watching/waiting for compilation to finish
Once the problem occurs, it goes away immediately. Sometimes it appears numerous times (2-3-5) in a small period of time (e.g. 10 seconds), but in other instances it may happen once in 10-15 minutes.
I saw #2735 (closed) and assumed that the problem should go away after the fixes land upstream, but unfortunately the problem is still there, even with 6.5.5. I'm not entirely sure that this is the same problem though - it is possible that I'm mixing things.
I tried to bisect the problem and it appears that the problematic commit is this:
commit 38d88d5e97c9032ebeca092b9372209f2ca92cdf
Merge: 864e029fea2b e701156ccc6c
Author: Dave Airlie <airlied@redhat.com>
Date: Fri Jul 14 11:44:54 2023 +1000
Merge tag 'amd-drm-fixes-6.5-2023-07-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.5-2023-07-12:
amdgpu:
- SMU i2c locking fix
- Fix a possible deadlock in process restoration for ROCm apps
- Disable PCIe lane/speed switching on Intel platforms (the platforms don't support it)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230712184009.7740-1-alexander.deucher@amd.com
Looking at the history of drivers/gpu/drm/amd
in the 6.5-arch1 tag, the previous commit is e701156ccc6c7 "drm/amd: Align SMU11 SMU_MSG_OverridePcieParameters implementation with SMU13", but I haven't tried this changeset because the kernel version at that point has been 6.4.0-rc7 and 6.4.12 was the latest stable one for me (I've confirmed that downgrading the kernel is a workaround). In any case, I'm willing to experiment with patches, if needed.
Hardware description:
- CPU: AMD Ryzen 9 7950X3D
- GPU: 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8)
- System Memory: 64GB
- Display(s): 1x Freesync 2160p 120hz over HDMI, 1x Non-freesync 1440p 60hz over DP
System information:
- Distro name and Version: Arch Linux
- Kernel version: 6.5.5