regression bisected RX580 GPU crash chipset overheating (kernel 6.8 and later)
Brief summary of the problem:
It is impossible to use the discrete GPU as primary render with recent kernel in a decade old all AMD computer. Trying results in the system becoming unresponsive, then a black screen and after a few seconds an corrupted image in the screen. The problem occur with the Manjaro kernel as well as the upstream compiled kernels.
All 6.6+ kernels are affected (6.13.9, 6.14 e LTS kernel 6.12)
All old LTS kernels are not affected (6.6, 6.1, 5.15, 5.10 e 5.4), with a smooth function of the machine not showing any issue at all.
The tip of Linus's mainline tree is also affected by the issue.
Using only the integrated GPU (AMD RS880 [Radeon HD 4250], from Phenom II x6 1090T processor), result in no issue with any of the kernels compiled of the versions supported (last and stable ones) in kernel.org. Changing to the discrete GPU as primary render on bios (the system is a decade old and still uses the bios not being compatible the UEFI system) result in Black screen after the login screen on KDE Plasma.
Keeping the integrated GPU in the bios and offloading the graphic applications to the discrete GPU (RX580) with DRI_PRIME=1
results in the application not appearing in the screen and after some minutes the screen start alternating between black screen and flickered screen (corrupted image), with unresponsive system. The moment you offload it to the discrete GPU you can’t access anymore the KDE launcher (menu). Both Wayland and X11 show the same behavior.
Trying Steam games with Steam Proton results in the same behavior: game won't show in the screen and the system become unresponsive. When starting the game sometimes the Xwayland crash menu (when using wayland) or X crash menu (when using X11) appears. Starting the glxinfo or eglinfo with the command DRI_PRIME=1 glxinfo -B
or DRI_PRIME=1 eglinfo -B
result in the same crash behavior after some minutes.
The crash occurs for a year but only after changing, this month, the power supply and the case for one with easy access I could verify that when using the discrete GPU (bios or DRI_PRIME don’t matter) the heatsink over the chipset (AMD 880G on M5A88-M Asus motherboard) starts to get hot. When it get too hot to withstand with the bare hand the system starts to crash showing the black screen and the corrupted image. The motherboard and the CPU sensors show always temperatures below 43ºC. The heatsink hotness do not occur with 6.6 LTS kernel and older ones.
Using the upstream git kernel I could bisect the problem to the commit “[466a7d115326ece682c2b60d1c77d1d0b9010b4f] drm/amd: Use the first non-dGPU PCI device for BW limits”
Possible related to #3761 issue: same behavior (crash not showing on 6.7 launch and start to appear with 6.8 launch. Same screen problems).
Searching for open bugs with “RX580” show many been reported in the last year (after 6.8 kernel launch) with the same screen problems. Since none noted chipset overheating (that I too did not before having easy access to motherboard this month) it was open another issue, for confirmation if it is really the same issue.
I can confirm it doesn’t affect modern, all AMD, computer (ryzen 7000 series and radeon RX6750 XT) nor Intel + Nvidia hybrid laptops.
Hardware description:
- CPU: AMD Phenom(tm) II X6 1090T Processor
- Integrated GPU: AMD RS880
- Discrete GPU: AMD Radeon RX 580 Series
- System Memory: total: 16 GiB available: 15.37 GiB used: 4.18 GiB (27.2%)
- Display(s): AOC D26W931 res: 1360x768@60hz, 100% DPI scaling
- Type of Display Connection: HDMI
System information:
- Distro name and Version: Manjaro
- Kernel version: 6.14.0-1
- Custom kernel: Kernel compiled from git+https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux for bisecting
- AMD official driver version: N/A, only Linux kernel drivers
How to reproduce the issue:
On a M5A88-M Asus motherboard try to use the the discrete GPU (RX 580) as primary render through bios set or try to start an application or Steam Proton game on the discrete GPU with DRI_PRIME=1
once started the system with integrated GPU.
Attached files:
build_config: complete hardware and system information.
inxi.txt: .config file used to compile the git linux kernel.
BISECT_LOG: log of the bisect process starting with LTS kernels 6.6 and 6.12.