[HadesCanyon/regression] GPU hang causes also X server to die
@eero-t
Submitted by Eero Tamminen Assigned to Default DRI bug account
Link to original bug (#112226)
Description
Setup:
- HW: KBL HadesCanyon (i7-8809G with Radeon RX Vega M GH)
- OS: Ubuntu 18.04 with Unity desktop (compiz)
- SW: Git builds of drm-tip kernel, Mesa and X server
Issue:
* AMD GPU driver stopped recovering from bug 108898 KBL HadesCanyon GPU hangs.
It still claims to recover from the bug:
-------------------------------------------------------
[ 1057.512690] Iteration 2/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_manhattan
[ 1119.867403] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 1124.987449] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
-------------------------------------------------------
But now all 3D tests run after this error will fail.
This started to happen between following (drm-tip) kernel commits:
* 2019-10-28 16:01:46: 912b87256c: drm-tip: 2019y-10m-28d-16h-00m-10s UTC integration manifest
* 2019-10-29 17:58:05: a2c9f8ce2a: drm-tip: 2019y-10m-29d-17h-57m-39s UTC integration manifest
And following Mesa commits:
* 2019-10-28 17:47:06: d298740a1c: iris: Disallow incomplete resource creation
* 2019-10-29 16:19:34: ff6e148a3d: freedreno/a6xx: add a618 support
Note:
* I'm not seeing the same issue by using few months old Mesa with latest drm-tip kernel, so some change in Mesa triggers this kernel issue
* If latest Mesa is used with drm-tip kernel 5.3, 4/5 times X fails to start. This started to happen with Mesa version within couple of days of the GPU hang recovery issue, so potentially there are more issue in Mesa (HadesCanyon) AMD support