[bisected][RDNA2] graphics corruption, rendered text disappearing, sluggish desktop (Radeon RX 6700 XT, kernel 6.10.10)
With kernel v6.10.9 my system (Ryzen 9 5950X, Radeon RX 6700 XT) runs ok but with kernel v6.10.10 I get sluggish X as soon as lightdm starts and once I enter XFCE I get graphics corruption (already rendered text disappears randomly, other graphics corruption) and the desktop is really sluggish too.
dmesg shows no warnings and nothing special but when I try to restart lightdm it fails with this output:
[...]
[ 226.640] (II) UnloadModule: "amdgpu"
[ 226.640] (II) UnloadModule: "modesetting"
[ 226.640] (II) Unloading modesetting
[ 226.640] (II) UnloadModule: "fbdev"
[ 226.640] (II) Unloading fbdev
[ 226.640] (II) UnloadSubModule: "fbdevhw"
[ 226.640] (II) Unloading fbdevhw
[ 226.640] Failed to allocate cursor buffer memory
[ 226.641] (EE) AMDGPU(0): amdgpu_setup_kernel_mem failed
[ 226.641] (EE)
Fatal server error:
[ 226.641] (EE) AddScreen/ScreenInit failed for driver 0
[ 226.641] (EE)
[ 226.641] (EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
[ 226.641] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 226.641] (EE)
[ 226.650] (EE) Server terminated with error (1). Closing log file.
As kernel v6.10.9 was ok and v6.10.10 wasn't I bisected the issue and got this result:
git bisect start
# Status: warte auf guten und schlechten Commit
# good: [1611860f184a2c9e74ed593948d43657734a7098] Linux 6.10.9
git bisect good 1611860f184a2c9e74ed593948d43657734a7098
# Status: warte auf schlechten Commit, 1 guter Commit bekannt
# bad: [049be94099ea5ba8338526c5a4f4f404b9dcaf54] Linux 6.10.10
git bisect bad 049be94099ea5ba8338526c5a4f4f404b9dcaf54
# bad: [e454476c44524024fd98e3fea1a488c8fca45bea] bpf, net: Fix a potential race in do_sock_getsockopt()
git bisect bad e454476c44524024fd98e3fea1a488c8fca45bea
# bad: [3e0a295002822e35c2b45d0a139ceb70d5a5d0a0] wifi: ath12k: fix uninitialize symbol error on ath12k_peer_assoc_h_he()
git bisect bad 3e0a295002822e35c2b45d0a139ceb70d5a5d0a0
# good: [d81ef42faf969e7d401a914e01dfeafb583f6077] clk: starfive: jh7110-sys: Add notifier for PLL0 clock
git bisect good d81ef42faf969e7d401a914e01dfeafb583f6077
# good: [ddee07e8ad906068c29932bae0a5fe045c71b8c7] net: mctp-serial: Fix missing escapes on transmit
git bisect good ddee07e8ad906068c29932bae0a5fe045c71b8c7
# good: [c3ae6e7b970d7445a277fd7ffff6e12c18f725d0] btrfs: qgroup: don't use extent changeset when not needed
git bisect good c3ae6e7b970d7445a277fd7ffff6e12c18f725d0
# bad: [675d6d34fc1c36a7cee0d10e06985fb1e6bc7746] drm/amdgpu: always allocate cleared VRAM for GEM allocations
git bisect bad 675d6d34fc1c36a7cee0d10e06985fb1e6bc7746
# good: [a6b268fce7bf5c6b8cb3942f315483c628a1635d] drm/panthor: flush FW AS caches in slow reset path
git bisect good a6b268fce7bf5c6b8cb3942f315483c628a1635d
# good: [1cc695be8920df234f83270d789078cb2d3bc564] drm/imagination: Free pvr_vm_gpuva after unlink
git bisect good 1cc695be8920df234f83270d789078cb2d3bc564
# first bad commit: [675d6d34fc1c36a7cee0d10e06985fb1e6bc7746] drm/amdgpu: always allocate cleared VRAM for GEM allocations
drm/amdgpu: always allocate cleared VRAM for GEM allocations
This adds allocation latency, but aligns better with user
expectations. The latency should improve with the drm buddy
clearing patches that Arun has been working on.
In addition this fixes the high CPU spikes seen when doing
wipe on release.
v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528
Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
(cherry picked from commit 6c0a7c3c693ac84f8b50269a9088af8f37446863)
Cc: stable@vger.kernel.org # 6.10.x
When I revert 675d6d34fc1c36a7cee0d10e06985fb1e6bc7746 on top of v6.10.10 the issue indeed disappears!
Some data about the system:
# inxi -bz
System:
Kernel: 6.10.9-gentoo-Zen3 arch: x86_64 bits: 64
Desktop: Xfce v: 4.18.1 Distro: Gentoo Base System release 2.15
Machine:
Type: Desktop Mobo: ASRock model: B550M Pro4 serial: <superuser required>
UEFI: American Megatrends LLC. v: P3.40 date: 01/18/2024
CPU:
Info: 16-core AMD Ryzen 9 5950X [MT MCP] speed (MHz): avg: 550
min/max: 550/5084
Graphics:
Device-1: AMD RV370 [Radeon X300/X550/X1050 Series] driver: radeon v: kernel
Device-2: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
driver: amdgpu v: kernel
Display: x11 server: X.org v: 1.21.1.13 driver: X: loaded: radeon
unloaded: amdgpu,fbdev,modesetting dri: radeonsi,r300 gpu: amdgpu,radeon
resolution: <missing: xdpyinfo/xrandr> resolution: 1: 3840x2160
2: 1920x1080
API: OpenGL v: 4.6 compat-v: 2.1 vendor: amd mesa v: 24.2.2 renderer: AMD
Radeon RX 6700 XT (radeonsi navi22 LLVM 18.1.8 DRM 3.57
6.10.9-gentoo-Zen3)
Network:
Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
driver: r8169
# lspci -v -s 08:00.0
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] (rev c5) (prog-if 00 [VGA controller])
Subsystem: Tul Corporation / PowerColor Device 2310
Flags: bus master, fast devsel, latency 0, IRQ 81, IOMMU group 2
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=2M]
I/O ports at f000 [size=256]
Memory at fc900000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, IntMsgNum 0
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [240] Power Budgeting <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [440] Lane Margining at the Receiver
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Please find Xorg.0.logs, kernel .config and bisect.log attached. config_61010_zen3-clang18 bisect.log Xorg.0.log_6.10.9 Xorg.0.log_6.10.10