Regression: 100% reproducible GPU hangs in GfxBench Car Chase & Aztec Ruins benchmarks
Between following drm-tip versions:
- 8f73cd99e6: 2023y-07m-25d-16h-32m-02s UTC integration manifest
- 50f130ab30: 2023y-07m-26d-14h-37m-59s UTC integration manifest
Kernel started to GPU hang 3x times during each GfxBench Car Chase and Aztec Ruins run:
[ 5181.934825] Iteration 1/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_4
[ 5196.997717] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 5196.997828] i915 0000:00:02.0: [drm] testfw_app[9811] context reset due to GPU hang
[ 5197.020016] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9fff2, in testfw_app [9811]
[ 5210.821053] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 5210.821177] i915 0000:00:02.0: [drm] testfw_app[9812] context reset due to GPU hang
[ 5210.842677] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9eff2, in testfw_app [9812]
[ 5224.644442] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 5224.644565] i915 0000:00:02.0: [drm] testfw_app[9812] context reset due to GPU hang
[ 5224.665198] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8ed9eff2, in testfw_app [9812]
...
[ 6060.130093] Iteration 1/3: bin/testfw_app --gfx glfw --gl_api desktop_core --width 1920 --height 1080 --fullscreen 1 --test_id gl_5_normal
[ 6075.036781] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 6075.036913] i915 0000:00:02.0: [drm] testfw_app[6413] context reset due to GPU hang
[ 6075.120293] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8fd8ffff, in testfw_app [6413]
[ 6089.372194] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 6089.372323] i915 0000:00:02.0: [drm] testfw_app[6414] context reset due to GPU hang
[ 6089.453649] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8fd8ffff, in testfw_app [6414]
[ 6099.611724] i915 0000:00:02.0: [drm] Resetting rcs0 for preemption time out
[ 6099.611859] i915 0000:00:02.0: [drm] testfw_app[6414] context reset due to GPU hang
[ 6099.662211] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:8fdaffff, in testfw_app [6414]
Error state for drm-tip yesterday (2023-08-15) Git head: i915_error_state.txt
Other notes:
- This is 100% reproducible, there are 3x hangs on every run
- Of the 3 machines types on which I'm running this, I see this only on GEN12 TGL, not on GEN9 BXT / GLK => It could be either GEN12+ or TGL specific
- Because this does not happen on BXT / GLK which are significantly slower than TGL, this should not be an issue of shader just being very slow
- Not sure whether it's at all related, but TGL boots have also started to slow down (so that it hits automation timeouts)
Edited by Eero Tamminen