Semi-random GPU lockups on radeonsi with a RadeonHD 7770 (when playing videos, running OpenGL games, WebGL apps, or after extended periods of time)
Submitted by Jean-François Fortin Tam
Assigned to Default DRI bug account
Description
Fedora 23, xorg-x11-drv-ati, on a Dell Precision T3500 (latest BIOS, A17) with a RadeonHD 7770 GPU. Running the latest up-to-date stock packages from Fedora.
If I start a game like Xonotic (from the Fedora repos) or Unvanquished (latest alpha binary build downloaded from their github repo), after a minute or two of just looking around as a spectator player, I'll eventually see my computer's monitor turn off all of a sudden. Sound will continue to play for a while, then it might stop/loop. After a few seconds, the kernel will be locked up with the CapsLock LED no longer working.
This also happened to me once simply by watching a video fullscreen in Totem (I'm running GNOME Shell, FWIW), but this is a much rarer occurrence.
Unfortunately I don't have knowledge of debugging such things, and ABRT somehow thinks my kernel is tainted with the "I" status (meaning it's "working around a severe firmware bug"), which I suppose might be the radeon microcode, so I can't get ABRT to create a nice automated retrace/full debug thing for me. But at least it still has stuff stored on disk, if there's anything in there you'd need:
ls -lh /var/spool/abrt/oops-2015-12-10-21:50:22-777-1/
-rw-r----- 1 root abrt 5 10 déc 21:50 abrt_version -rw-r----- 1 root abrt 9 10 déc 21:50 analyzer -rw-r----- 1 root abrt 6 10 déc 21:50 architecture -rw-r----- 1 root abrt 3,7K 10 déc 21:50 backtrace -rw-r----- 1 root abrt 124 10 déc 21:50 cmdline -rw-r----- 1 root abrt 16 10 déc 21:50 component -rw-r----- 1 root abrt 1 10 déc 21:50 count -rw-r----- 1 root abrt 71K 10 déc 21:50 dmesg -rw-r----- 1 root abrt 40 10 déc 21:50 duphash -rw-r----- 1 root abrt 23 10 déc 21:50 extra-cc -rw-r----- 1 root abrt 8 10 déc 21:50 hostname -rw-r----- 1 root abrt 21 10 déc 21:50 kernel -rw-r----- 1 root abrt 25 10 déc 21:50 kernel_tainted_long -rw-r----- 1 root abrt 3 10 déc 21:50 kernel_tainted_short -rw-r----- 1 root abrt 10 10 déc 21:50 last_occurrence -rw-r----- 1 root abrt 173 10 déc 21:50 not-reportable -rw-r----- 1 root abrt 518 10 déc 21:50 os_info -rw-r----- 1 root abrt 32 10 déc 21:50 os_release -rw-r----- 1 root abrt 6 10 déc 21:50 package -rw-r----- 1 root abrt 7 10 déc 21:50 pkg_arch -rw-r----- 1 root abrt 2 10 déc 21:50 pkg_epoch -rw-r----- 1 root abrt 12 10 déc 21:50 pkg_name -rw-r----- 1 root abrt 9 10 déc 21:50 pkg_release -rw-r----- 1 root abrt 6 10 déc 21:50 pkg_version -rw-r----- 1 root abrt 4,4K 10 déc 21:50 proc_modules -rw-r----- 1 root abrt 37 10 déc 21:50 reason -rw-r----- 1 root abrt 8 10 déc 21:50 runlevel -rw-r----- 1 root abrt 269 10 déc 21:50 suspend_stats -rw-r----- 1 root abrt 10 10 déc 21:50 time -rw-r----- 1 root abrt 10 10 déc 21:50 type -rw-r----- 1 root abrt 40 10 déc 21:50 uuid
This is what I get in journalctl/dmesg:
-- Logs begin at lun 2015-11-30 21:48:19 EST, end at jeu 2015-12-10 23:48:33 EST. --
déc 10 21:49:00 the_PC kernel: radeon 0000:02:00.0: ring 3 stalled for more than 10115msec
déc 10 21:49:00 the_PC kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x000000000000a5fe last fence id 0x000000000000a600 on ring 3)
déc 10 21:49:01 the_PC kernel: BUG: unable to handle kernel paging request at ffffc90404239ffc
déc 10 21:49:01 the_PC kernel: IP: [<ffffffffa00f850a>
] radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: PGD 6068a8067 PUD 0
déc 10 21:49:01 the_PC kernel: Oops: 0000 [#1 (closed)] SMP
déc 10 21:49:01 the_PC kernel: Modules linked in: fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable
déc 10 21:49:01 the_PC kernel: radeon i2c_algo_bit drm_kms_helper ttm drm serio_raw
déc 10 21:49:01 the_PC kernel: CPU: 3 PID: 153 Comm: kworker/u64:7 Tainted: G I 4.2.6-301.fc23.x86_64 #1 (closed)
déc 10 21:49:01 the_PC kernel: Hardware name: Dell Inc. Precision WorkStation T3500 /0K095G, BIOS A17 05/28/2013
déc 10 21:49:01 the_PC kernel: Workqueue: radeon-crtc radeon_flip_work_func [radeon]
déc 10 21:49:01 the_PC kernel: task: ffff88060299b880 ti: ffff8805ff5c0000 task.ti: ffff8805ff5c0000
déc 10 21:49:01 the_PC kernel: RIP: 0010:[<ffffffffa00f850a>
] [<ffffffffa00f850a>
] radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: RSP: 0018:ffff8805ff5c3c98 EFLAGS: 00010206
déc 10 21:49:01 the_PC kernel: RAX: ffffc9000fe50000 RBX: 00000000ffffffff RCX: 0000000000000000
déc 10 21:49:01 the_PC kernel: RDX: 0000000000000000 RSI: ffffc90404239ffc RDI: 0000000000080500
déc 10 21:49:01 the_PC kernel: RBP: ffff8805ff5c3cd8 R08: ffff8805771f8cc0 R09: 0000000000082000
déc 10 21:49:01 the_PC kernel: R10: 8000000000000163 R11: ffffffff81a609e9 R12: ffff880036a654d8
déc 10 21:49:01 the_PC kernel: R13: ffff880036a654b0 R14: 0000000000020141 R15: ffff8805ff5c3d30
déc 10 21:49:01 the_PC kernel: FS: 0000000000000000(0000) GS:ffff880606ec0000(0000) knlGS:0000000000000000
déc 10 21:49:01 the_PC kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
déc 10 21:49:01 the_PC kernel: CR2: ffffc90404239ffc CR3: 0000000001c0b000 CR4: 00000000000006e0
déc 10 21:49:01 the_PC kernel: Stack:
déc 10 21:49:01 the_PC kernel: ffff8805ff5c3cc8 ffffffffa00f9413 ffff880036a64000 ffff880036a64000
déc 10 21:49:01 the_PC kernel: ffff880036a654d8 ffff8805ff5c3d30 ffff880036a654d8 0000000000000000
déc 10 21:49:01 the_PC kernel: ffff8805ff5c3da8 ffffffffa00c6c80 ffffffff810df990 ffff880036a64738
déc 10 21:49:01 the_PC kernel: Call Trace:
déc 10 21:49:01 the_PC kernel: [<ffffffffa00f9413>
] ? radeon_irq_kms_disable_hpd+0x73/0x80 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffffa00c6c80>
] radeon_gpu_reset+0xd0/0x330 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffff810df990>
] ? wake_atomic_t_function+0x70/0x70
déc 10 21:49:01 the_PC kernel: [<ffffffffa00e058f>
] ? radeon_fence_wait+0x9f/0xe0 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffffa00ed960>
] radeon_flip_work_func+0x130/0x170 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffff810b650e>
] process_one_work+0x19e/0x3f0
déc 10 21:49:01 the_PC kernel: [<ffffffff810b67ae>
] worker_thread+0x4e/0x450
déc 10 21:49:01 the_PC kernel: [<ffffffff810b6760>
] ? process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel: [<ffffffff810b6760>
] ? process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel: [<ffffffff810bc8b8>
] kthread+0xd8/0xf0
déc 10 21:49:01 the_PC kernel: [<ffffffff810bc7e0>
] ? kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel: [<ffffffff817797df>
] ret_from_fork+0x3f/0x70
déc 10 21:49:01 the_PC kernel: [<ffffffff810bc7e0>
] ? kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel: Code: 10 e1 48 85 c0 49 89 07 74 6c 41 8d 7e ff 31 d2 48 c1 e7 02 eb 07 49 8b 07 48 83 c2 04 49 8b 74 24 08 8d 4b 01 89 db 48 8d 34 9e <8b>
36 89 34 10 41 23 4c 24 54 48 39 d7 89 cb 75 da 4c 89 ef e8
déc 10 21:49:01 the_PC kernel: RIP [<ffffffffa00f850a>
] radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: RSP <ffff8805ff5c3c98>
déc 10 21:49:01 the_PC kernel: CR2: ffffc90404239ffc
déc 10 21:49:01 the_PC kernel: ---[ end trace 37e2470f6b251992 ]---
déc 10 21:49:01 the_PC kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8
déc 10 21:49:01 the_PC kernel: IP: [<ffffffff810bcd40>
] kthread_data+0x10/0x20
déc 10 21:49:01 the_PC kernel: PGD 1c0e067 PUD 1c10067 PMD 0
déc 10 21:49:01 the_PC kernel: Oops: 0000 [#2 (closed)] SMP
déc 10 21:49:01 the_PC kernel: Modules linked in: fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable
déc 10 21:49:01 the_PC kernel: radeon i2c_algo_bit drm_kms_helper ttm drm serio_raw
déc 10 21:49:01 the_PC kernel: CPU: 3 PID: 153 Comm: kworker/u64:7 Tainted: G D I 4.2.6-301.fc23.x86_64 #1 (closed)
déc 10 21:49:01 the_PC kernel: Hardware name: Dell Inc. Precision WorkStation T3500 /0K095G, BIOS A17 05/28/2013
déc 10 21:49:01 the_PC kernel: task: ffff88060299b880 ti: ffff8805ff5c0000 task.ti: ffff8805ff5c0000
déc 10 21:49:01 the_PC kernel: RIP: 0010:[<ffffffff810bcd40>
] [<ffffffff810bcd40>
] kthread_data+0x10/0x20
déc 10 21:49:01 the_PC kernel: RSP: 0018:ffff8805ff5c3918 EFLAGS: 00010096
déc 10 21:49:01 the_PC kernel: RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000005
déc 10 21:49:01 the_PC kernel: RDX: 0000000000000005 RSI: 0000000000000003 RDI: ffff88060299b880
déc 10 21:49:01 the_PC kernel: RBP: ffff8805ff5c3918 R08: ffff88060299b910 R09: 0000000000000000
déc 10 21:49:01 the_PC kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000167c0
déc 10 21:49:01 the_PC kernel: R13: ffff88060299b880 R14: ffff880606ed67c0 R15: 0000000000000003
déc 10 21:49:01 the_PC kernel: FS: 0000000000000000(0000) GS:ffff880606ec0000(0000) knlGS:0000000000000000
déc 10 21:49:01 the_PC kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
déc 10 21:49:01 the_PC kernel: CR2: 0000000000000028 CR3: 0000000001c0b000 CR4: 00000000000006e0
déc 10 21:49:01 the_PC kernel: Stack:
déc 10 21:49:01 the_PC kernel: ffff8805ff5c3938 ffffffff810b7385 ffff8805ff5c3938 ffff880606ed67c0
déc 10 21:49:01 the_PC kernel: ffff8805ff5c3988 ffffffff81774fc0 ffff880500000000 ffff88060299b880
déc 10 21:49:01 the_PC kernel: ffff8805ff5c3988 ffff8805ff5c4000 ffff8805ff5c39f0 ffff8805ff5c39f0
déc 10 21:49:01 the_PC kernel: Call Trace:
déc 10 21:49:01 the_PC kernel: [<ffffffff810b7385>
] wq_worker_sleeping+0x15/0xa0
déc 10 21:49:01 the_PC kernel: [<ffffffff81774fc0>
] __schedule+0x620/0x950
déc 10 21:49:01 the_PC kernel: [<ffffffff81775327>
] schedule+0x37/0x80
déc 10 21:49:01 the_PC kernel: [<ffffffff810a103a>
] do_exit+0x80a/0xae0
déc 10 21:49:01 the_PC kernel: [<ffffffff810180fe>
] oops_end+0x9e/0xd0
déc 10 21:49:01 the_PC kernel: [<ffffffff81064c25>
] no_context+0x135/0x380
déc 10 21:49:01 the_PC kernel: [<ffffffff81064ef0>
] __bad_area_nosemaphore+0x80/0x1f0
déc 10 21:49:01 the_PC kernel: [<ffffffff81065073>
] bad_area_nosemaphore+0x13/0x20
déc 10 21:49:01 the_PC kernel: [<ffffffff81065357>
] __do_page_fault+0xb7/0x400
déc 10 21:49:01 the_PC kernel: [<ffffffff810656cf>
] do_page_fault+0x2f/0x80
déc 10 21:49:01 the_PC kernel: [<ffffffff8177b378>
] page_fault+0x28/0x30
déc 10 21:49:01 the_PC kernel: [<ffffffffa00f850a>
] ? radeon_ring_backup+0xda/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffffa00f85b0>
] ? radeon_ring_backup+0x180/0x190 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffffa00f9413>
] ? radeon_irq_kms_disable_hpd+0x73/0x80 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffffa00c6c80>
] radeon_gpu_reset+0xd0/0x330 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffff810df990>
] ? wake_atomic_t_function+0x70/0x70
déc 10 21:49:01 the_PC kernel: [<ffffffffa00e058f>
] ? radeon_fence_wait+0x9f/0xe0 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffffa00ed960>
] radeon_flip_work_func+0x130/0x170 [radeon]
déc 10 21:49:01 the_PC kernel: [<ffffffff810b650e>
] process_one_work+0x19e/0x3f0
déc 10 21:49:01 the_PC kernel: [<ffffffff810b67ae>
] worker_thread+0x4e/0x450
déc 10 21:49:01 the_PC kernel: [<ffffffff810b6760>
] ? process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel: [<ffffffff810b6760>
] ? process_one_work+0x3f0/0x3f0
déc 10 21:49:01 the_PC kernel: [<ffffffff810bc8b8>
] kthread+0xd8/0xf0
déc 10 21:49:01 the_PC kernel: [<ffffffff810bc7e0>
] ? kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel: [<ffffffff817797df>
] ret_from_fork+0x3f/0x70
déc 10 21:49:01 the_PC kernel: [<ffffffff810bc7e0>
] ? kthread_worker_fn+0x160/0x160
déc 10 21:49:01 the_PC kernel: Code: c4 08 44 89 e8 5b 41 5c 41 5d 5d c3 4c 89 e7 e8 e7 eb fd ff eb 88 0f 1f 44 00 00 66 66 66 66 90 48 8b 87 90 05 00 00 55 48 89 e5 <48>
8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
déc 10 21:49:01 the_PC kernel: RIP [<ffffffff810bcd40>
] kthread_data+0x10/0x20
déc 10 21:49:01 the_PC kernel: RSP <ffff8805ff5c3918>
déc 10 21:49:01 the_PC kernel: CR2: ffffffffffffffd8
déc 10 21:49:01 the_PC kernel: ---[ end trace 37e2470f6b251993 ]---
déc 10 21:49:01 the_PC kernel: Fixing recursive fault but reboot is needed!
-- Reboot --
Version: 11.0