mesa issueshttps://gitlab.freedesktop.org/Venemo/mesa/-/issues2023-02-04T13:21:48Zhttps://gitlab.freedesktop.org/Venemo/mesa/-/issues/2radv: Mesh shader support2023-02-04T13:21:48ZTimur Kristófradv: Mesh shader supportThis is the plan to support Mesh shaders in RADV. Currently there is no official cross-vendor extension (yet?) so we will start working with `VK_NV_mesh_shader`, then, if a cross-vendor EXT is made in the future, it should be easy to cov...This is the plan to support Mesh shaders in RADV. Currently there is no official cross-vendor extension (yet?) so we will start working with `VK_NV_mesh_shader`, then, if a cross-vendor EXT is made in the future, it should be easy to cover that too.
* [x] NIR + SPIR-V for NV mesh shaders
* [x] Basic support by Caio from Intel: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600
* [x] I/O fixes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466
* [x] Mesh shaders in RADV https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466
* [x] Shader support in RADV and ACO
* [x] Mesh shader lowering in `ac_nir_lower_ngg`
* [x] Per-primitive outputs in ACO
* [x] Pipeline support in RADV
* [x] Mesh shading pipeline compilation without task shader
* [x] Per-primitive outputs
* [x] Shader arguments
* [x] Draw support in RADV
* [x] New draw call functions in `radv_cmd_buffer.c`
* [x] Task shaders in RADV https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14929
* [x] Task shader I/O lowering (lower TS outputs to memory stores)
* [x] Mesh shading pipeline compilation with task shader
* [x] Shader arguments
* [x] Draw callsTimur KristófTimur Kristófhttps://gitlab.freedesktop.org/Venemo/mesa/-/issues/1nir: Lower AMD specific shader I/O, merge shaders and handle NGG2023-03-09T20:11:00ZTimur Kristófnir: Lower AMD specific shader I/O, merge shaders and handle NGGThis is the grand plan of how to unify some code which is currently duplicated between three shader compiler backends: RadeonSI/LLVM, RADV/LLVM, RADV/ACO. This plan will help us de-duplicate a lot of code, and also enable us to implement...This is the grand plan of how to unify some code which is currently duplicated between three shader compiler backends: RadeonSI/LLVM, RADV/LLVM, RADV/ACO. This plan will help us de-duplicate a lot of code, and also enable us to implement some new features, such as NGG primitive culling in NIR, without the need to specifically cater to such features in the compiler backend; in contrast to what we currently have, where each backend must be tailored to support each feature or edge case.
The following milestones list all the features that I'd like to cover. Each milestone should go into a separate MR series which implements the listed functionality and switches ACO over to them. Then the other backends can be integrated in separate merge requests.
I implemented most of these features in ACO, so I feel that it will be straightforward for me to rewrite them to NIR and adapt the ACO code base. I'd appreciate some help from the team with integrating the new NIR code with the other backends.
* [x] **Pre-requisites:**
* [x] Unified way to handle shader arguments
* Already done by Connor
* [x] Shader argument intrinsics
* Already done by Rhys in his descriptor lowering branch.
* [x] **Milestone 1/A:** Lower AMD-specific shader I/O in NIR
* Current status: each backend implements the NIR I/O intrinsics on its own (mostly the same way, but with subtle differences)
* Proposal: where applicable, lower NIR I/O intrinsics to memory accesses: either shared memory, or VRAM.
* Benefits, notes:
* We can get rid of all the ACO and LLVM backend code which deals with this I/O.
* Lowering to VRAM is not a requirement, but pretty easy to do, and helps us get rid of the rest of the backend specific I/O code.
* This also brings the LLVM backend up to par with ACO in some optimizations that are only implemented in ACO.
* [x] Implement NIR IO lowering passes *(in progress)*
* [x] LS outputs -> shared memory store
* [x] HS inputs -> shared memory load
* [x] HS outputs, including tess factors -> shared memory access and VRAM store
* [x] TES ES inputs -> VRAM load
* [x] ES outputs -> shared memory store (GFX9+) / VRAM store (GFX8-)
* [x] GS inputs -> shared memory load (GFX9+) / VRAM load (GFX8-)
* [x] Integrating the new NIR passes into the backends
* [x] RADV/ACO
* [x] RADV/LLVM
* [x] RadeonSI/LLVM
* [ ] **Milestone 1/B (optional):** De-duplicate some code regarding occupancy calculations.
* [x] Move workgroup size calculation from ACO to RADV, or even AC.
* [x] Move `tcs_num_patches` calculation to RADV, remove it from ACO and from `radv_nir_to_llvm`.
* [ ] Try to share LSHS occupancy calculation (`get_tcs_num_patches`) between RADV and RadeonSI.
* [ ] Try to share NGG occupancy calculation between RADV and RadeonSI.
* [ ] ~~**Milestone 2:** Merged shaders in NIR~~
* *I decided to scratch this idea, since it seems to be more trouble than it's worth.*
* Current status: all 3 backends take 2 NIR shaders as input, and bolt these shaders together
* ~~Proposal: make NIR aware of the fact that a shader can contain pieces of 2 different stages.~~
* [x] **Milestone 3:** Lower NGG in NIR
* Current status: we don't have a direct equivalent of NGG capabilities in NIR, and pretend that it works like the traditional model. We only map the traditional model to NGG in the backend compilers.
* Proposal: define new intrinsics for NGG features (basically, mesh shader like features), then lower VS, TES and GS to this.
* Benefits, notes:
* This will de-duplicate a lot of complicated logic that currently exists in all backends.
* It brings us one step closer to supporting actual mesh shaders, because it will give us a base
to which we can also lower mesh shaders in the future. (Notable exception: we don't care about
generic per-primitive outputs now, among a few other things.)
* [x] Introduce necessary concepts to NIR
* Notes about how NGG hardware works
* It is basically a strict subset of a normal mesh shader.
The point of the lowering passes is to transform the VS/TES/GS (and later MS) model into NGG terms.
* The shader **must** know the number of exported vertices and primitives and allocate space for them
before exporting them.
* Each active lane **must** export 0 or 1 primitive and specify the vertex indices
(a vertex index is the ID of the lane in the threadgroup which will export that vertex).
* Each active lane **must** export **exactly 1** vertex.
* If we don't do exactly what the HW expects, it will express its discontent by hanging.
* [x] NIR NGG lowering passes
* [x] NGG VS and TES
* [x] NGG GS
* [x] Integrate the new NIR passes into the backends
* [x] RADV/ACO
* [x] RADV/LLVM
* [x] RadeonSI/LLVM
* [x] **Milestone 4/A:** NGG shader-based primitive culling in NIR
* With all of the above in place, it becomes possible to implement primitive culling in NIR,
without any need for the backends to be aware of it.
* [x] **Milestone 4/B:** NGG streamout (aka. transform feedback) in NIR
* This is somewhat orthogonal, but loosely connected to the above topics. Since streamout means basically
just writing data to VRAM, we can now also implement it in a NIR pass.