nir: Begin transitioning away from nir_register and source/dest modifiers
nir_register
is invasive in NIR. Now that we have backends ingesting pure SSA, having nir_register in the core data structure is undesirable for compile-time and memory usage on these backends. The nir_reg_src
structure is large, and is_ssa
checks infect every part of the codebase they touch. So we've wanted these gone for a while.
Unfortunately, we have lots of backends that are relying on nir_register
for their codegen, so we can't just throw the baby out with the bath water. A solution that gets ACO a 10% compile-time win and costs Midgard a 10% instruction count is not acceptable here. Thankfully, that's not the tradeoff we have
My proposed replacement is a set of intrinsics that mirror the functionality of the nir_register
, nir_reg_src
, and nir_reg_dest
data structures. In particular, this series adds...
-
declare_reg
, which returns an opaque 32-bit handle representing a register. The parameters of the register (size, array components, etc) are supplied as constant indices. This corresponds tonir_register
. -
load_reg
/load_reg_indirect
, which takes in a register handle (and optional base + indirect offset) and loads the value of that register. This corresponds tonir_reg_src
. -
store_reg
/store_reg_indirect
, ditto for a store. The store includes a write mask, used for ALU instructions on vector backends. This corresponds tonir_reg_dest
andnir_alu_dest::write_mask
.
As an example, instead of the register-heavy NIR:
vec4 32 r0
r0.xyz = fsqrt ssa_0.xyz
r0.w = fneg ssa_1.x
store_output(r0)
we would get the (physically pure SSA) NIR
vec1 32 ssa_2 = declare_reg (num_components = 4, bitsize = 32)
vec4 32 ssa_3 = fsqrt ssa_0.xyzx
store_reg(ssa_3, ssa_2, writemask=xyz)
vec4 32 ssa_4 = fneg ssa_1.xxxx
store_reg(ssa_4, ssa_2, writemask=w)
vec4 32 ssa_5 = load_reg(ssa_2)
store_output(ssa_5)
These register access intrinsics may be translated naively in the backend to moves between SSA values and non-SSA values. This strategy is appropriate for backends with competent copy propagation, including layered drivers. For some hardware backends, however, this regresses code quality.
Fortunately, there's a neat solution. If the register access intrinsics obey a strict set of rules (no data dependency hazards, etc), we consider them to be "trivial" as they can always propagated into the producing/consuming instructions. Conversely, when these rules are not satisfied, the final code will generally require additional copies regardless of the IR. So we introduce a common NIR pass to ensure that register access intrinsics are trivial, inserting copies if absolutely required.
Once we've established that all the intrinsics are trivial, we can fold loads into sources and stores into destinations simply by chasing the use-def chains, which is O(1)
in NIR. The nir_load_reg_for_def
and nir_store_reg_for_def
do this chasing.
This allows a straightforward transition for backends: call the new NIR passes instead of the old nir_register ones, call the trivialize pass at the very end, and then use the helpers to get the load/store wherever you currently use nir_src/nir_dest with register support. There will be some new register access intrinsics; if you are consistent with using the helpers, they may simply be ignored! Because they're trivial at this point, they're guaranteed to be fully consumed. You cannot attempt to translate store_reg
if you use nir_store_reg_for_def
, since you'll end up with a spurious stores. You are free to translate load_reg
, it'll just get DCE'd if you're consistent with your nir_load_store_for_def
as required for good code gen.
The following NIR passes are replaced in this series to accommodate register access intrinsics:
-
nir_lower_vec_to_movs
->nir_lower_vec_to_regs
-
nir_lower_regs_to_ssa
->nir_lower_reg_intrinsics_to_ssa
-
nir_lower_locals_to_regs
->nir_lower_locals_to_reg_intrinsics
-
nir_convert_from_ssa
gains abool reg_intrinsics
argument, set to true to generate intrinsics.
In addition to the nir_register
work, this series also addresses the issue of source/destination modifiers (abs/neg/sat). These have a similar set of problems to nir_register
, as they are invasive and thus harm NIR's memory footprint / compile-time / ergonomics on all backends, even when not used. Removing modifiers is an especially juicy target as no mature backends use NIR's modifiers, instead doing their own backend propagation to handle architecture-specific constraints and optimizations. In isolation, removing them without hurting their few users would be tricky. However, this work falls out naturally from the rest of the nir_register
work. After this MR is merged, each backend that uses NIR modifiers needs to be transitioned away from that at the same time as transitioning away from nir_register
.
But this series doesn't leave those backends in the dark. For backends consuming traditional modifiers, this series provides register-ful nir_legacy_src/nir_legacy_dest
and modifier-ful nir_legacy_alu_src/nir_legacy_alu_dest
mimicking what we have now and a set of helpers for reconstructing these registerful legacy structures from the trivialized SSA. The upshot is that very few backend changes are needed to transition away from nir_alu_src/dest modifiers if the legacy helpers are used. In particular, the legacy helpers do not require any backend copy propagation or dead code elimination. See the nir_to_tgsi
commit in this MR for details on how that works, as well as the comments in nir_legacy.h
.
If you're not sure what strategy your backend should use, I've made recommendations for each backend over in #9051 (closed)
To demonstrate that all this works -- and provide a blueprint to help you convert your backends -- this initial series converts the following initial backends:
- Midgard (chasing helpers)
- intel (chasing helpers)
- nir_to_tgsi (
nir_legacy
) - gallivm (direct translation)
- zink (direct translation)
After this MR is merged, the remaining backends may be ported in parallel. That will require all of us working together to get NIR there. But once all backends are ported, we can remove nir_register
and its helpers, lighten the data structures, improve NIR's memory footprint and compile time, remove a LOT of pointless validation, rename nir_ssa_def
to nir_def
, remove nir_dest
in favour of a nir_def
directly, remove abs/neg/sat modifiers, remove write masks, and probably even more :~)
...so I hope you'll help us get there!
Contains !23769 (merged) !23804 (merged)