Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • A amd
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,476
    • Issues 1,476
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • drm
  • amd
  • Issues
  • #727
Closed
Open
Issue created Jul 17, 2016 by Bugzilla Migration User@bugzilla-migration

R290X stuck at 100% GPU load / full core clock on non-x86 machines

Submitted by Timothy Pearson

Assigned to Default DRI bug account

Link to original bug (#96964)

Description

Our twin Radeon 290X cards are stuck at 100% GPU load (according to radeontop and Gallium) and full core clock (according to radeon_pm_info) on non-x86 machines such as our POWER8 compute server. The identical card does not show this behaviour on a test x86 machine.

Forcibly crashing the GPU (causing a soft reset) fixes the issue. Relevant dmesg output starts at line 4 in this pastebin: https://bugzilla.kernel.org/show_bug.cgi?id=70651 It is unknown if simply triggering a soft reset without the GPU crash would also resolve the issue.

I suspect this is related to the atombios x86-specific oprom code only executing on x86 machines, and related setup therefore not being finalized by the radeon driver itself on non-x86 machines. However, this is just an educated guess.

radeontop output of stuck card:
gpu 100.00%, ee 0.00%, vgt 0.00%, ta 0.00%, sx 0.00%, sh 0.00%, spi 0.00%, sc 0.00%, pa 0.00%, db 0.00%, cb 0.00%

radeontop output of "fixed" card after GPU crash / reset, running 3D app:
gpu 4.17%, ee 0.00%, vgt 0.00%, ta 3.33%, sx 3.33%, sh 0.00%, spi 3.33%, sc 3.33%, pa 0.00%, db 3.33%, cb 3.33%, vram 11.72% 479.87mb

Despite the "100% GPU load" indication, there is no sign of actual load being placed on the GPU. 3D-intensive applications function 100% correctly with no apparent performance degradation, so it seems the reading is a.) spurious and b.) causing the core clock to throttle up needlessly.

Assignee
Assign to
Time tracking