[RS690] GPU Lockup CP Stall and Resulting Kernel Oops (Kernel 3.2.0)
Submitted by rei..@..il.com
Assigned to Default DRI bug account
Description
[Problem]
Since upgrade from kernel 2.6.35 to kernel 3.2.0 (Ubuntu 12.04) we experience numerous kernel freezes (no keyboard/mouse, no kernel logging, no num key change, stop of server applications, e. g. dhcpd, postfix, bind9, magic keys not working, no serial console), that can only be resolved by switching power off (hard reset). There is no clear way to reproduce this bug. The likelihood of the kernel crash increases, if mail GUIs like Evolution or Thunderbird are open or Firefox is open and when switching between these windows. The kernel freezes latest within 6 h. Whereas if just the desktop and xterm is running, the system seems to be stable (48 h and more). The bug can be confirmed under lightdm/unity as desktop, as well as when using mdm/cinnamon. Log information can only be retrieved using netconsole. They show a GPU lockup and CP stall, from which the radeon driver cannot recover.
[Configuration Specifics]
We run two X Servers: one controlled by the display manager (on vt7 or vt8) and one controlled by xinit (on vt9). See attached process list.
We use the DVI port of the integrated Radeon X1200 (RS690) display controller on an ASUS MSA mainboard. One monitor (Samsung) is connected. Radeon driver with KMS enabled is used.
[Netconsole Output]
The last kernel log messages that reach the netconsole receiver vary:
a) The shortest log
[212173.596044] radeon 0000:01:05.0: GPU lockup CP stall for more than 10000msec
[212175.370234] radeon 0000:01:05.0: failed to reset GPU
[212175.406899] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(3).
[212175.406912] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
b) A longer one
[286295.708052] radeon 0000:01:05.0: GPU lockup CP stall for more than 10020msec
[286297.455900] radeon 0000:01:05.0: failed to reset GPU
[286297.929150] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(14).
[286297.929174] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
[286297.937321] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(15).
[286297.937349] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
[286297.943050] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(0).
[286297.943074] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
[286297.947188] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(1).
[286297.947213] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
[286297.949490] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(2).
[286297.949509] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
c) GPU reset attempt
[179005.128038] radeon 0000:01:05.0: GPU lockup CP stall for more than 10032msec
[179005.128068] GPU lockup (waiting for 0x0004E1FF last fence id 0x0004E1F4)
[179005.268649] radeon: wait for empty RBBM fifo failed ! Bad things might happen.
[179005.409044] Failed to wait GUI idle while programming pipes. Bad things might happen.
[179005.410064] radeon 0000:01:05.0: (rs600_asic_reset:348) RBBM_STATUS=0x9401C100
[179005.908155] radeon 0000:01:05.0: (rs600_asic_reset:367) RBBM_STATUS=0x9401C100
[179006.405224] radeon 0000:01:05.0: (rs600_asic_reset:375) RBBM_STATUS=0x9400C100
[179006.902280] radeon 0000:01:05.0: (rs600_asic_reset:383) RBBM_STATUS=0x9400C100
[179006.902315] radeon 0000:01:05.0: restoring config space at offset 0x1 (was 0x100403, writing 0x100407)
[179006.902346] radeon 0000:01:05.0: failed to reset GPU
[179006.903346] radeon 0000:01:05.0: GPU reset failed
d) Successful GPU reset but inaccessible CP
[ 1775.356043] radeon 0000:01:05.0: GPU lockup CP stall for more than 10008msec
[ 1775.356067] GPU lockup (waiting for 0x000124ED last fence id 0x000124EA)
[ 1775.919383] radeon: wait for empty RBBM fifo failed ! Bad things might happen.
[ 1776.059845] Failed to wait GUI idle while programming pipes. Bad things might happen.
[ 1776.060872] radeon 0000:01:05.0: (rs600_asic_reset:348) RBBM_STATUS=0xB001C100
[ 1776.559021] radeon 0000:01:05.0: (rs600_asic_reset:367) RBBM_STATUS=0x90010140
[ 1777.056092] radeon 0000:01:05.0: (rs600_asic_reset:375) RBBM_STATUS=0x10000140
[ 1777.553160] radeon 0000:01:05.0: (rs600_asic_reset:383) RBBM_STATUS=0x10000140
[ 1777.553197] radeon 0000:01:05.0: restoring config space at offset 0x1 (was 0x100403, writing 0x100407)
[ 1777.553232] radeon 0000:01:05.0: GPU reset succeed
[ 1777.554232] radeon 0000:01:05.0: GPU reset succeed
[ 1777.554263] sched: RT throttling activated
[ 1777.749090] [drm] radeon: 1 quad pipes, 1 z pipes initialized.
[ 1777.754590] [drm] PCIE GART of 512M enabled (table at 0x0000000036700000).
[ 1777.754958] radeon 0000:01:05.0: WB enabled
[ 1777.754991] [drm] radeon: ring at 0x0000000080001000
[ 1777.892767] [drm:r100_ring_test] ERROR radeon: ring test failed (scratch(0x15E4)=0xCAFEDEAD)
[ 1777.892774] [drm:r100_cp_init] ERROR radeon: cp isn't working (-22).
[ 1777.892783] radeon 0000:01:05.0: failed initializing CP (-22).
[ 1786.390793] [drm:radeon_ib_schedule] ERROR radeon: couldn't schedule IB(11).
[ 1786.390818] [drm:radeon_cs_ioctl] ERROR Failed to schedule IB !
A really verbose log with drm.debug set to 0xf has been attached. As well as the usually required information.