Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
Submitted by Timur Kristóf
Assigned to Default DRI bug account
Link to original bug (#108493)
Description
I experience a consistent amdgpu crash when using my AMD GPU with a 4K screen.
Hardware:
- Sapphire Radeon RX 570 Pulse ITX 4GB
- Zotac AMP box mini external GPU enclosure
- Dell XPS 13 9370 laptop
- Dell U2718Q 4K display
Software:
First tried with Fedora 28. Now using Fedora 29. Tried kernel versions 4.18.12, 4.18.13 and 4.19-rc7, the issue appears with all of these. Mesa version is 18.2.2, but the crash is also there with 18.0 (on Fedora 28).
Steps to reproduce the crash:
1. Turn off the laptop
2. Attach the eGPU to the laptop
3. Attach a 4K screen to the HDMI output of the AMD GPU
4. Turn on the laptop
5. Add the following to the kernel command line: 'module_blacklist=i915 3' (to ensure the Intel GPU is not used at all, plus the graphical login won't interfere)
6. Launch the operating system
7. Log in from the console
8. Launch an X session with 'startx'
9. Start the Unigine Heaven benchmark in fullscreen 4K
Expected outcome:
Unigine Heaven should show up and run in a stable and performant manner.
Actual outcome:
Unigine Heaven shows up, runs for a couple of seconds and then the screen goes dark. I can still log into the machine with SSH, but can not kill X or interact with the AMD GPU in any way. Can't even reboot the machine, the only thing that works is long pressing the power key.
Relevant lines from dmesg log:
[ 305.078426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=147930, emitted seq=147933
[ 305.078567] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=3176, emitted seq=3178
[ 305.078573] [drm] GPU recovery disabled.
Possible workaround:
* The crash does not happen when I disable power management with amdgpu.dpm=0, however then it has very poor performance.
* The crash also doesn't happen when I use 'echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level' with the same note about bad performance.
Additional information:
* Note that running any other graphics intensive application (ie. your favourite game) will also result in the same crash, but Unigine Heaven is what I found to be the quickest way to reproduce it.
* Also note that the crash is not X-specific but again this is what I found to be the simplest way to reproduce it.
* The very same hardware works correctly on Windows without a crash. So this is probably not a hardware defect.
* The crash is almost immediate on 4K, but it also occours with other resolutions, just takes more time. At 1440p it takes a couple of minutes but still crashes. At 1080p I could run it for several minutes without a crash (did not test further than that).
* The problem seems to be similar to these: https://bugs.freedesktop.org/show_bug.cgi?id=105733 and https://bugs.freedesktop.org/show_bug.cgi?id=102322 - the difference is that the suggested workarounds don't help, just seem to postpone the crash by a very small margin. It still crashes in less than a minute though.
* Enabling GPU recovery does not actually manage to recover the GPU.
If you need any other kind of log or any more info, please let me know. Thank you in advance for looking into solving this problem.