Computer restart playing 3D games (possibly overheating)
Submitted by Alex Henry
Assigned to Default DRI bug account
Description
Hello, sorry if I'm reporting this to the wrong product but the bug report procedures on the wiki are pretty hard to understand https://www.x.org/wiki/RadeonFeature/#index11h2
I have an onboard Radeon HD 3000 (ATI RS780L). I am primarily using Debian testing ("stretch"). I believe the open source drivers for it are providade by the following packages:
linux-image-4.6.0-1-amd64 (4.6.4-1, radeon.ko)
xserver-xorg-video-ati (1:7.7.0-1)
xserver-xorg-video-radeon (1:7.7.0-1, radeon_drv.so, due to update in a few days)
The problem I'm having is that when playing 3D games the computer will randomly crash and reboot after anywhere from 10 minutes into the game, to over an hour of gameplay without rebooting (mostly around 10-30 minutes with a crash). It seems to be heat-related because when it's cooler it seems the game has less chance of crashing and when a crash happens and I try playing again after the reboot is complete, the crash seems to happen more rapidly - maybe after 5 minutes or so playing.
The crash seems to be some sort of hardware failure because there are no trace in the journald persistent logs that I can find. For this, I can't also be sure what is happening. Let me know if there is something I can do to help debug this.
To verity if the error was the driver's fault I installed a new Debian system (oldstable, Debian 7 "wheezy"), which allowed me to install the fglrx legacy driver with Radeon HD 3000 support. In this new system I haven't experienced a single reboot so far - which establishes the cause isn't hardware-only related but very likely a driver issue. Being an entirely new system means it could be something else too but since I have very frequent crashes/reboots in my primary system and none so far in the alternate system while running games for an hour or so frequently on a hot day, would indicate that the fault is indeed coming from the Radeon open-source criver.
I haven't done any 3D gaming in this computer before a couple of weeks from now so I can't say that this bug only happens on recent driver versions or not. Watching videos in a browser or in a video application (such as VLC) and 2D games like http://littlewargame.com or rendering videos (via kdenlive or such) do not cause reboots, even though they can be relatively heavy on the GPU. I haven't had any random crashes in a very long while as well except when doing 3D gaming. I've installed a few games to test it out and whenever 3D gaming the crashes do happen frequently. Some of the games I've used to test this are Heroes of Newerth and Runescape (both free to download and play) and very lightweigth in the low settings, so it shouldn't be a quesiton of me stressing the card too much either (HoN for example works fine with the fglrx legacy driver on my alternate system).
I have run memtest86, CPU and memory stress (stress-ng 0.06.15-1) in the hopes of catching a spike in my machine's temperature as the culprit for these random crashes but I've found the temperature to be stable and low even during heavy load for a long time. I've ran the Geeks3D GpuTest, which puts a heavier load than these games on the graphics card but haven't been able to cause a crash, even though in this case my tests haven't been extensible - I can run them for longer though if it would help debug the issue.
I undestand that there have been recent updates on the graphics drivers on the new Linux kernel update. I will try the new drivers as they come out since it's somewhat of a bother having to reboot the computer (and maintain a legacy system) whenever I want to play 3D games and if the problem is solved I'll report back here. If I don't comment on this issue in the near future it's because the problem persists even with the new drivers.
My guess about what is happening: since the problem seems to be heat-related maybe there is some sort of temperature sensor that the open source driver isn't able to read on my card - which I was expecting to be able to see using lmsensors (version 1:3.4.0-3) while maybe the fglrx is able to read and handle heat properly.
I don't usually report bugs to trackers that already have many reports open but since I've spent a long time in tracking this issue and was able to fix it, I thought that I should share all the information I've gathered in the hope it's useful. It's an older, onboard graphics card model, probably getting to the end of its lifespan soon but I hope this report is valuable in some way, anyway. Thank you for the good work on these open drivers. If I were able to use them on my primary system I'd certainly do it, even if the fglrx legacy system is a little bit smoother, since it would be a lot more convenient than maitaning a separare gaming system in my machine. Thanks again for the contributions to the FOSS community and for the time reading this report!