Skip to content

Draft: radv: Implement get_guilty_info()

André Almeida requested to merge andrealmeid/mesa:radv_get_guilty_info into main

The goal of this draft is to gather feedback about the proposed DRM interface and the dumped information. As stated in the commit description, the goal is to make easier for Mesa devs to figure out why the GPU has crashed with less overhead, so Mesa will run umr -di [vmid@]address length on the exact IB that caused the hang if the app is the guilty one.

The kernel part can be found here: https://gitlab.freedesktop.org/andrealmeid/linux/-/tree/amdgpu_dump_hang

cc: @Venemo @hakzsam @bnieuwenhuizen


Currently, when a app crashes on Mesa there's not much information available for the bug report. Users can run the app with RADV_DEBUG=hang in the hope to get more information, but this option has some overhead and the dumped information hasn't much of context, making debug as hard as finding a needle in a haystack.

To solve both issues, introduce a new query function to ask the kernel information when a hang happens by the guilty app. This means that innocent apps won't have overheads and now we can dump just the guilty indirect buffer, making easier to developers to find the offending instruction that hanged the GPU.

The only information available so far is the IB address, it's size and the VM id.

Merge request reports