R290X stuck at 100% GPU load / full core clock on non-x86 machines
Submitted by Timothy Pearson
Assigned to Default DRI bug account
Description
Our twin Radeon 290X cards are stuck at 100% GPU load (according to radeontop and Gallium) and full core clock (according to radeon_pm_info) on non-x86 machines such as our POWER8 compute server. The identical card does not show this behaviour on a test x86 machine.
Forcibly crashing the GPU (causing a soft reset) fixes the issue. Relevant dmesg output starts at line 4 in this pastebin: https://bugzilla.kernel.org/show_bug.cgi?id=70651 It is unknown if simply triggering a soft reset without the GPU crash would also resolve the issue.
I suspect this is related to the atombios x86-specific oprom code only executing on x86 machines, and related setup therefore not being finalized by the radeon driver itself on non-x86 machines. However, this is just an educated guess.
radeontop output of stuck card:
gpu 100.00%, ee 0.00%, vgt 0.00%, ta 0.00%, sx 0.00%, sh 0.00%, spi 0.00%, sc 0.00%, pa 0.00%, db 0.00%, cb 0.00%
radeontop output of "fixed" card after GPU crash / reset, running 3D app:
gpu 4.17%, ee 0.00%, vgt 0.00%, ta 3.33%, sx 3.33%, sh 0.00%, spi 3.33%, sc 3.33%, pa 0.00%, db 3.33%, cb 3.33%, vram 11.72% 479.87mb
Despite the "100% GPU load" indication, there is no sign of actual load being placed on the GPU. 3D-intensive applications function 100% correctly with no apparent performance degradation, so it seems the reading is a.) spurious and b.) causing the core clock to throttle up needlessly.