Invalid data in error state
Submitted by Lionel Landwerlin
Assigned to Intel GFX Bugs mailing list
Link to original bug (#107691)
Description
As part of debugging hangs we're starting to look more at the error states generated by i915.
Together with Jason we've noticed that the data produced is incorrect.
Chunks of 32 bytes appear to just go missing (replaced by 0s).
For example this bug report : https://bugs.freedesktop.org/show_bug.cgi?id=107586
has 2 error state from which you can see that the context image has its first 32bytes at 0s.
What we should be finding at that location looks more like this :
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/intel/tools/gen8_context.h#L27
MI_NOOP followed by MI_LRI.
We've noticed this issue on both Skylake & Kabylake (I believe this affects at least all big cores).
To workaround this issue I came up with this patch : https://github.com/djdeath/linux/commit/c18d4e1ee66cf587c484a60bba64f3dc4f35fc2e
This is probably wrong as pointed by Chris on IRC, but it gets us correct data in the error state.
This issue can be easily reproduced by running the IGT drv_hangman test on the render ring and checking the content of the "rcs0 --- HW context" BO in the error state. At offset 8092 we should find the MI_NOOP following by MI_LRI I pointed above, but instead of 32 bytes of 0s.