Machine freeze when clocks are set to defaults
Submitted by Maxime Daniel
Assigned to Default DRI bug account
Description
My laptop contains theses cards:
----
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330] (rev 81)
----
Under Gentoo, I enabled radeon and radeonsi as videos cards, everything looks fine. According to xrandr, I have providers:
----
Provider 0: id: 0x76 cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 4 outputs: 3 associated providers: 0 name:Intel
Provider 1: id: 0x4f cap: 0xd, Source Output, Source Offload, Sink Offload crtcs: 0 outputs: 0 associated providers: 0 name:HAINAN @ pci:0000:01:00.0
----
I found on the internet that DPM could cause issue, I tried: radeon.runpm=0 radeon.dpm=0 (see below)
Using theses settings, I don't have any freeze when I try to use the Radeon Card (using DRI_PRIME=1), but I found that everything was slow. I checked, and I saw:
---
cat /sys/class/drm/card1/device/power_method
profile
cat /sys/class/drm/card1/device/power_profile
default
cat /sys/kernel/debug/dri/65/radeon_pm_info
default engine clock: 1070000 kHz
current engine clock: 299990 kHz
default memory clock: 900000 kHz
current memory clock: 298990 kHz
voltage: 1150 mV
PCIE lanes: 4
---
The card is running in low profile by default, don't know why. Setting power_profile to high, mid or low doesn't change anything, but if I set power_profile back to default again, the clock is set to full speed.
When clock is set to full speed, my system freeze if I try to run any 3D application (glxgears or a game using wine). Here is the dmesg log:
---
radeon 0000:01:00.0: ring 0 stalled for more than 10436msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000014f4 last fence id 0x00000000000014f6 on ring 0)
radeon 0000:01:00.0: Saved 49 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000049
radeon 0000:01:00.0: GRBM_STATUS = 0xE5D04028
radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0xEE400000
radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006
radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0
radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00018000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008000
radeon 0000:01:00.0: R_008680_CP_STAT = 0x80030243
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0: GRBM_STATUS = 0x00003028
radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006
radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006
radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0
radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000
radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000
radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] probing gen 2 caps for device 8086:9d10 = 1724843/e
[drm] PCIE gen 3 link speeds already enabled
[drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000c00 and cpu addr 0xffff88046c66dc00
radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000c04 and cpu addr 0xffff88046c66dc04
radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000c08 and cpu addr 0xffff88046c66dc08
radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000c0c and cpu addr 0xffff88046c66dc0c
radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000c10 and cpu addr 0xffff88046c66dc10
[drm:r600_ring_test [radeon]] ERROR radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[drm:si_resume [radeon]] ERROR si startup failed on resume
---
I found out with theses steps that, if I don't set dpm=0, I hit exactly the same issue, I guess using DPM the clock is set to high when the card is used and it crash. When the card is stuck on that loop, I need to reset the machine (but network still works).
I'm using mesa-13.0.2 with a 4.7.6 kernel, I have the same issue using mesa 12.0.1