radv: Geometry corruption in RenderDoc mesh viewer for mesh shaders
Description
NOTE: this is a follow up from #10163 (closed) so you will need !26252 (merged) applied locally otherwise you'll be blocked by that first.
When using RenderDoc to debug some more complex mesh shader samples I get corrupted geometry or sometimes task payload data produced. Unfortunately I've not been able to narrow it down further than this so far.
When loading e.g. the Lantern glTF sample in Granite's meshlet-viewer, the output looks fairly corrupt although the task output payload data and dispatch sizes do seem correct.
I've also reproduced this on niagara where the task output is also corrupt, but that might not be as helpful to investigate since it's more non-deterministic.
Notably, this doesn't happen on the very simple task & mesh demos in Sascha's or the Vulkan-Samples repos. That may be useful as a point of working comparison.
The super high level overview of what RenderDoc does internally is (assuming presence of a task shader):
- Patch the task shader to replace the emit with 0,0,0. Thread 0 in each group writes the actual emit size and payload (via a readback from the payload variable) into output buffer.
- Read back the results on the CPU, determine the number of mesh groups that will execute in order to size a reasonably compact worst-case output buffer.
- Patch a per-task group offset into the payload data for each set of mesh groups.
- Create a new task shader that just reads the size+offset+payload from a buffer, and emits it. The offset is passed as an additional uint in the payload.
- Patch the mesh shader to also output no primitives, but instead write each index/vertex directly into the output buffer with location as calculated from the (optional) task group offset + meshlet-group-in-dispatch local offset + element offset.
- From here it does another readback into CPU compaction & reorganisation of the data into a regular VB+IB but the API work is done.
All output buffers are passed via BDA uint64 spec constants, with structs in output/payload storage duplicated and given a layout in BDA storage. If no task shader is present we skip straight to step 5. effectively using the CPU dispatch as the size for the output buffer.
Screenshots/video files
The output looks something like this for the Lantern:
where I'd expect it to look more like this:
Steps to reproduce
This should be run with recent RenderDoc v1.x branch to get latest fixes for some RenderDoc bugs. At least ca5783e7127cfbc3e538f32a3ba8a94d5289e296.
- Open the provided RenderDoc capture, or launch Granite's meshlet-viewer with a glTF sample and make one. I used Lantern to try and find a simpler case but it also happens with Sponza.
- Open the mesh viewer (Window -> mesh viewer)
- Select the
vkCmdDrawMeshTasksEXT
action, and wait if the mesh viewer takes a while to process. - Click over on the preview to 'mesh out' and see there's exploded polys. You can also change the 'visualisation' dropdown to meshlet.
System information
System:
Host: lupino2 Kernel: 6.2.0-37-generic arch: x86_64 bits: 64 compiler: N/A
Desktop: Fluxbox v: 1.3.5 dm: GDM3 Distro: Ubuntu 23.04 (Lunar Lobster)
CPU:
Info: 12-core model: AMD Ryzen 9 7900X bits: 64 type: MT MCP arch: Zen 4
rev: 2 cache: L1: 768 KiB L2: 12 MiB L3: 64 MiB
Speed (MHz): avg: 3615 high: 5480 min/max: 3000/5733 boost: enabled cores:
1: 5110 2: 3140 3: 4680 4: 5480 5: 2822 6: 3141 7: 4537 8: 2802 9: 3138
10: 3143 11: 3141 12: 3101 13: 3200 14: 3142 15: 3139 16: 4490 17: 2747
18: 3000 19: 3636 20: 2749 21: 4700 22: 4700 23: 4038 24: 3000
bogomips: 225616
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
Device-1: AMD Navi 31 [Radeon RX 7900 XT/7900 XTX] vendor: ASUSTeK
driver: amdgpu v: kernel arch: RDNA-3 pcie: speed: 16 GT/s lanes: 16 ports:
active: HDMI-A-1 empty: DP-1,DP-2,DP-3 bus-ID: 03:00.0 chip-ID: 1002:744c
Device-2: AMD Raphael driver: amdgpu v: kernel arch: RDNA-2 pcie:
speed: 16 GT/s lanes: 16 ports: active: none empty: DP-4,HDMI-A-2
bus-ID: 13:00.0 chip-ID: 1002:164e temp: 41.0 C
Device-3: Valve 3D Camera type: USB driver: uvcvideo bus-ID: 8-2.1:3
chip-ID: 28de:2400
Display: x11 server: X.Org v: 1.21.1.7 with: Xwayland v: 22.1.8 driver: X:
loaded: modesetting alternate: fbdev,vesa dri: radeonsi gpu: amdgpu
display-ID: :0 screens: 1
Screen-1: 0 s-res: 2560x1440 s-dpi: 96
Monitor-1: HDMI-A-1 mapped: HDMI-1 model: HP X27q res: 2560x1440 dpi: 109
diag: 685mm (27")
API: OpenGL v: 4.6 Mesa 24.0.0-devel (git-1ef5feac5e) renderer: AMD
Radeon RX 7900 XTX (radeonsi navi31 LLVM 15.0.7 DRM 3.49
6.2.0-37-generic) direct-render: Yes
API captures (if applicable, optional)
Further information (optional)
This works on NV drivers on linux and windows. I'm not able to test AMD windows with Granite as it doesn't run the mesh shader path, but it does work for niagara. As far as I can see has no validation errors or SPIR-V validation problems reported. That's not conclusive as it might still be a RenderDoc bug, so let me know if you find anything suspicious and I can dig in further.