Mesa's `intel_hang_replay` tool fails to work
Google bug b/328306740
I'm filing this in drm/intel because I don't see anything wrong with the intel_hang_replay
tool.
When I run intel_hang_replay -d i915_error_state.dmp
it fails with
# intel_hang_replay -d i915_error_state.dmp
total_vma: 0x0000000071c9d000
fail to set context hw img: Invalid argument
With v6.11 on ADL, I've added some printk
s to gem/i915_gem_context.c
, and I see from dmesg:
/* gem_context_create: DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT */
i915_gem_context_create_ioctl: enter
set_proto_ctx_param(param = 10): enter /* I915_CONTEXT_PARAM_ENGINES */
set_proto_ctx_param(param = 10): returning 0
set_proto_ctx_param(param = 10): exit
set_proto_ctx_param(param = 8): enter /* I915_CONTEXT_PARAM_RECOVERABLE */
set_proto_ctx_param(param = 8): returning 0
set_proto_ctx_param(param = 8): exit
GRAPHICS_VER(i915) == 12
i915_gem_context_create_ioctl: args->ctx_id = 1
i915_gem_context_create_ioctl: exit
/* gem_context_set_hw_image */
/* DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM(I915_CONTEXT_PARAM_RECOVERABLE) */
i915_gem_context_setparam_ioctl: enter
__context_lookup(ffff9b6cd85f0d00, 1): enter
__context_lookup(ffff9b6cd85f0d00, 1): xa_load returned 0
__context_lookup(ffff9b6cd85f0d00, 1): returning 0
__context_lookup(ffff9b6cd85f0d00, 1): exit
i915_gem_context_setparam_ioctl: args->ctx_id = 1
i915_gem_context_setparam_ioctl: !ctx
i915_gem_context_setparam_ioctl: calling set_proto_ctx_param
set_proto_ctx_param(param = 8): enter
set_proto_ctx_param(param = 8): returning 0
set_proto_ctx_param(param = 8): exit
i915_gem_context_setparam_ioctl: returning 0
i915_gem_context_setparam_ioctl: exit
/* gem_context_set_hw_image */
/* DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM(I915_CONTEXT_PARAM_CONTEXT_IMAGE) */
i915_gem_context_setparam_ioctl: enter
__context_lookup(ffff9b6cd85f0d00, 1): enter
__context_lookup(ffff9b6cd85f0d00, 1): xa_load returned 0
__context_lookup(ffff9b6cd85f0d00, 1): returning 0
__context_lookup(ffff9b6cd85f0d00, 1): exit
i915_gem_context_setparam_ioctl: args->ctx_id = 1
i915_gem_context_setparam_ioctl: !ctx
i915_gem_context_setparam_ioctl: calling set_proto_ctx_param
set_proto_ctx_param(param = 15): enter
set_proto_ctx_param(I915_CONTEXT_PARAM_CONTEXT_IMAGE)
set_proto_ctx_param(param = 15): returning -22
set_proto_ctx_param(param = 15): exit
i915_gem_context_setparam_ioctl: returning -22
i915_gem_context_setparam_ioctl: exit
intel_hang_replay.c
created a context in the first block with DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT
and then in the second and third blocks called DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM
.
In both DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM
calls, the __context_lookup
call fails. In the latter, setparam I915_CONTEXT_PARAM_CONTEXT_IMAGE
fails because it's not implemented for proto contexts, so intel_hang_replay
fails.
Looking at gem/i915_gem_context.c:i915_gem_context_create_ioctl
(specifically https://gitlab.freedesktop.org/drm/kernel/-/blob/drm-next/drivers/gpu/drm/i915/gem/i915_gem_context.c#L2413), it's only on the GRAPHICS_VER(i915) > 12
path that xa_alloc
is called.
I don't see how xa_load
could ever return anything but 0
from __context_lookup
on ADL.
If I change the check to GRAPHICS_VER(i915) >= 12
, and run intel_hang_replay
, the setparam(I915_CONTEXT_PARAM_CONTEXT_IMAGE)
succeeds (but other stuff is broken):
i915_gem_context_setparam_ioctl: enter
__context_lookup(ffff90bccd671b80, 1): enter
__context_lookup(ffff90bccd671b80, 1): xa_load returned ffff90bcdb053000
__context_lookup(ffff90bccd671b80, 1): returning ffff90bcdb053000
__context_lookup(ffff90bccd671b80, 1): exit
i915_gem_context_setparam_ioctl: args->ctx_id = 1
i915_gem_context_setparam_ioctl: ctx (ret == 0)
set_context_image(I915_CONTEXT_PARAM_CONTEXT_IMAGE) returned 0
i915_gem_context_setparam_ioctl: returning 0
i915_gem_context_setparam_ioctl: exit
What is wrong here?
cc: @llandwerlin, @linyaa