[regression] (recoverable) GPU resets & fails in slower 3D benchmarks
@eero-t
Submitted by Eero Tamminen Assigned to Chris Wilson @ickle
Link to original bug (#112169)
Description
Setup:
- HW: e.g. J4205 (BXT) and 7567U-i7 (KBL GT3e)
- OS: Ubuntu 18.04
- SW: latest Git versions of drm-tip kernel, X server and Mesa
Between following drm-tip commits:
* 2019-10-23 16:27:36 863a8a1bef: drm-tip: 2019y-10m-23d-16h-23m-44s UTC integration manifest
* 2019-10-27 20:46:21 54520983c6: drm-tip: 2019y-10m-27d-20h-45m-34s UTC integration manifest
Some of the slower 3D benchmarks started to GPU hang:
------------------------------------------
[ 4937.365687] Iteration 1/3: GpuTest /test=pixmark_piano /width=1366 /height=768 /msaa=1 /no_scorebox /benchmark /benchmark_duration_ms=35000
[ 4937.904763] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4938.016748] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4938.128707] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4938.213360] Iteration 2/3: GpuTest /test=pixmark_piano /width=1366 /height=768 /msaa=1 /no_scorebox /benchmark /benchmark_duration_ms=35000
[ 4938.736916] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4938.848767] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4938.960794] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4939.013764] Iteration 3/3: GpuTest /test=pixmark_piano /width=1366 /height=768 /msaa=1 /no_scorebox /benchmark /benchmark_duration_ms=35000
[ 4939.536891] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4939.648780] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4940.560756] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4940.623625] Iteration 1/3: GpuTest /test=pixmark_volplosion /width=1366 /height=768 /msaa=1 /no_scorebox /benchmark /benchmark_duration_ms=35000
[ 4941.112755] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4941.224772] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4941.504756] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4941.569851] Iteration 2/3: GpuTest /test=pixmark_volplosion /width=1366 /height=768 /msaa=1 /no_scorebox /benchmark /benchmark_duration_ms=35000
[ 4942.032771] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4942.600708] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4942.864828] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4942.920061] Iteration 3/3: GpuTest /test=pixmark_volplosion /width=1366 /height=768 /msaa=1 /no_scorebox /benchmark /benchmark_duration_ms=35000
[ 4943.408815] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4943.520753] i915 0000:00:02.0: Resetting rcs0 for preemption time out
[ 4943.632747] i915 0000:00:02.0: Resetting rcs0 for preemption time out
------------------------------------------
These are mainly Piano & Volplosion in public GpuTest v0.7 test-suite, and most of the tests in the internal GPU MemBW test-suite.
Notes:
* There's no i915 error state
* Screen updates in the failing benchmarks happen on average at single digit (2-8) FPS on BXT
* These aren't slowest benchmarks, e.g. offscreen GfxBench tests run at <1 FPS, and don't have GPU resets