Capture job state on error in kernel driver
Currently we have no way figuring out which job caused the error and what this job was supposed to do. We need to implement something similar to i915_error_state to capture job state (including BOs)
Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Equinix is shutting down its operations with us on April 30, 2025. They have graciously supported us for almost 5 years, but all good things come to an end. We are expecting to transition to new infrastructure between late March and mid-April. We do not yet have a firm timeline for this, but it will involve (probably multiple) periods of downtime as we move our services whilst also changing them to be faster and more responsive. Any updates will be posted in freedesktop/freedesktop#2011 as it becomes clear, and any downtime will be announced with further broadcast messages.
Currently we have no way figuring out which job caused the error and what this job was supposed to do. We need to implement something similar to i915_error_state to capture job state (including BOs)
mentioned in issue lima_dump#1 (closed)
I've created a tool here: https://gitlab.freedesktop.org/lima/lima_dump
kernel driver changes can be found here: https://github.com/yuq/linux-lima/commits/topic/error-3
Would be better to hear your feedback before send upstream and help debug early.
I'd suggest putting lima_dump tool into mesa. We already have disassembler and standalone compiler in there.
Moreover we'll be able to use stream parser from mesa in this case.
In fact I think we should not put tools into mesa unless we have to. The standalone compiler is there because we have to use the mesa modules. I have an idea that we can do the same in mesa as kernel driver to save binary dump to file and let lima_dump to parse/modify/replay the file. So we can move the parse code out of mesa. We can also go further to merge the lima_dump and syscall-tracker so that we can just have a single parser code to maintain.
Yeah, merging it with syscall-tracker sounds good.
mentioned in issue #28 (closed)