radv/rt,aco: Ray Tracing Function Calls
Based on !29536, !29576, !29577, !29730.
This MR reworks the pretty much all of RADV/ACO's shader call lowering. Instead of using nir_lower_shader_calls to split shaders into CPS-style parts, we keep the shader whole and lower function calls to s_swappc (while introducing ABIs to govern which shaders preserve which registers).
This allows us to do some cool new optimizations, for example:
- Tail call optimization
- Using function parameters (i.e. registers) for ray payloads, which is faster than scratch memory
With the flexibility of ABIs and shader calls, we'll also be able to split any-hit shaders out of traversal shaders, which should significantly reduce the size (and thus, compile time) of the traversal shader as well as (hopefully) increase performance because any-hit shaders that need to spill are moved out of the traversal shader. This is future work, though, and not part of this MR.
I've temporarily disabled RT pipelines in a commit at the start of this MR, and reenabled them after all commits required for correctness.
Passes all CTS tests on GFX9 (RADV_PERFTEST=emulate_rt), GFX10.3 and GFX11.