AMD FirePro W5130M hangs with a "ring gfx timeout" or "ring 0 stalled" when launching GPU-intensive applications
System information
inxi -GSC -xx
output (with DRI_PRIME=1
set):
System: Host: localhost.localdomain Kernel: 5.13.12-1-default x86_64 bits: 64 compiler: gcc v: 11.1.1
Desktop: KDE Plasma 5.22.4 tk: Qt 5.15.2 wm: kwin_x11 dm: SDDM Distro: openSUSE Tumbleweed 20210820
CPU: Info: Quad Core model: Intel Core i7-6820HQ bits: 64 type: MT MCP arch: Skylake-S rev: 3 cache: L2: 8 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 43198
Speed: 800 MHz min/max: 800/3600 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800
8: 800
Graphics: Device-1: Intel HD Graphics 530 vendor: Dell driver: i915 v: kernel bus-ID: 00:02.0 chip-ID: 8086:191b
Device-2: Realtek Integrated_Webcam_HD type: USB driver: uvcvideo bus-ID: 1-11:3 chip-ID: 0bda:5686
Display: x11 server: X.org 1.20.13 compositor: kwin_x11 driver: loaded: amdgpu,modesetting
unloaded: fbdev,vesa alternate: ati,intel resolution: <missing: xdpyinfo>
OpenGL: renderer: AMD Radeon R9 M360 (VERDE DRM 3.41.0 5.13.12-1-default LLVM 12.0.1) v: 4.6 Mesa 21.2.0
direct render: Yes
lspci -k
output for more details on the dGPU:
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO / Venus LE / Tropo PRO-L [Radeon HD 8830M / R7 250 / R7 M465X] (rev 87)
Subsystem: Dell Device 06e0
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu
The actual dGPU model name is FirePro W5130M.
Description
Running any moderately GPU-intensive application on the dGPU such as SuperTuxKart at highest settings, glmark2
or vkmark
will always instantly freeze the application and crash the GPU drivers (see both dmesg
outputs below). When using radeon
drivers, it is possible to force-close the application and continue using the system normally after the GPU is reset; when on amdgpu
, however, the application will not close and continuing to use the system will eventually result in a complete freeze (typically after launching any program) from which the only way to recover is by forcibly shutting down the PC.
Log files
dmesg_amdgpu.txt dmesg_radeon.txt
Extra information
I've already tried virtually every kernel parameter for both amdgpu
and radeon
: the only one which makes the dGPU not crash is amdgpu.dpm=0
(or radeon.dpm=0
), but this merely forces the GPU into its lowest power state, hindering performance greatly.
As an ugly workaround, editing the kernel and forcing a lower max_sclk
in drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
and/or drivers/gpu/drm/radeon/si_dpm.c
will render the GPU perfectly stable when under full utilization. Values of 77500 for amdgpu
and 80000 for radeon
will work (default max is 92500, as reported by dmesg
).
To rule out possible hardware issues, the GPU works perfectly fine under Windows using the latest Radeon Pro drivers.