nir: Begin transitioning away from nir_register and source/dest modifiers (!23089) · Merge requests · Mesa / mesa

Alyssa Rosenzweig requested to merge alyssa/mesa:nir/legacy-reg into main May 17, 2023

nir_register is invasive in NIR. Now that we have backends ingesting pure SSA, having nir_register in the core data structure is undesirable for compile-time and memory usage on these backends. The nir_reg_src structure is large, and is_ssa checks infect every part of the codebase they touch. So we've wanted these gone for a while.

Unfortunately, we have lots of backends that are relying on nir_register for their codegen, so we can't just throw the baby out with the bath water. A solution that gets ACO a 10% compile-time win and costs Midgard a 10% instruction count is not acceptable here. Thankfully, that's not the tradeoff we have 😸

My proposed replacement is a set of intrinsics that mirror the functionality of the nir_register, nir_reg_src, and nir_reg_dest data structures. In particular, this series adds...

declare_reg, which returns an opaque 32-bit handle representing a register. The parameters of the register (size, array components, etc) are supplied as constant indices. This corresponds to nir_register.
load_reg/load_reg_indirect, which takes in a register handle (and optional base + indirect offset) and loads the value of that register. This corresponds to nir_reg_src.
store_reg/store_reg_indirect, ditto for a store. The store includes a write mask, used for ALU instructions on vector backends. This corresponds to nir_reg_dest and nir_alu_dest::write_mask.

As an example, instead of the register-heavy NIR:

vec4 32 r0
r0.xyz = fsqrt ssa_0.xyz
r0.w = fneg ssa_1.x
store_output(r0)

we would get the (physically pure SSA) NIR

vec1 32 ssa_2 = declare_reg (num_components = 4, bitsize = 32)
vec4 32 ssa_3 = fsqrt ssa_0.xyzx
store_reg(ssa_3, ssa_2, writemask=xyz)
vec4 32 ssa_4 = fneg ssa_1.xxxx
store_reg(ssa_4, ssa_2, writemask=w)
vec4 32 ssa_5 = load_reg(ssa_2)
store_output(ssa_5)

These register access intrinsics may be translated naively in the backend to moves between SSA values and non-SSA values. This strategy is appropriate for backends with competent copy propagation, including layered drivers. For some hardware backends, however, this regresses code quality.

Fortunately, there's a neat solution. If the register access intrinsics obey a strict set of rules (no data dependency hazards, etc), we consider them to be "trivial" as they can always propagated into the producing/consuming instructions. Conversely, when these rules are not satisfied, the final code will generally require additional copies regardless of the IR. So we introduce a common NIR pass to ensure that register access intrinsics are trivial, inserting copies if absolutely required.

Once we've established that all the intrinsics are trivial, we can fold loads into sources and stores into destinations simply by chasing the use-def chains, which is O(1) in NIR. The nir_load_reg_for_def and nir_store_reg_for_def do this chasing.

This allows a straightforward transition for backends: call the new NIR passes instead of the old nir_register ones, call the trivialize pass at the very end, and then use the helpers to get the load/store wherever you currently use nir_src/nir_dest with register support. There will be some new register access intrinsics; if you are consistent with using the helpers, they may simply be ignored! Because they're trivial at this point, they're guaranteed to be fully consumed. You cannot attempt to translate store_reg if you use nir_store_reg_for_def, since you'll end up with a spurious stores. You are free to translate load_reg, it'll just get DCE'd if you're consistent with your nir_load_store_for_def as required for good code gen.

The following NIR passes are replaced in this series to accommodate register access intrinsics:

nir_lower_vec_to_movs -> nir_lower_vec_to_regs
nir_lower_regs_to_ssa -> nir_lower_reg_intrinsics_to_ssa
nir_lower_locals_to_regs -> nir_lower_locals_to_reg_intrinsics
nir_convert_from_ssa gains a bool reg_intrinsics argument, set to true to generate intrinsics.

In addition to the nir_register work, this series also addresses the issue of source/destination modifiers (abs/neg/sat). These have a similar set of problems to nir_register, as they are invasive and thus harm NIR's memory footprint / compile-time / ergonomics on all backends, even when not used. Removing modifiers is an especially juicy target as no mature backends use NIR's modifiers, instead doing their own backend propagation to handle architecture-specific constraints and optimizations. In isolation, removing them without hurting their few users would be tricky. However, this work falls out naturally from the rest of the nir_register work. After this MR is merged, each backend that uses NIR modifiers needs to be transitioned away from that at the same time as transitioning away from nir_register.

But this series doesn't leave those backends in the dark. For backends consuming traditional modifiers, this series provides register-ful nir_legacy_src/nir_legacy_dest and modifier-ful nir_legacy_alu_src/nir_legacy_alu_dest mimicking what we have now and a set of helpers for reconstructing these registerful legacy structures from the trivialized SSA. The upshot is that very few backend changes are needed to transition away from nir_alu_src/dest modifiers if the legacy helpers are used. In particular, the legacy helpers do not require any backend copy propagation or dead code elimination. See the nir_to_tgsi commit in this MR for details on how that works, as well as the comments in nir_legacy.h.

If you're not sure what strategy your backend should use, I've made recommendations for each backend over in #9051 (closed)

To demonstrate that all this works -- and provide a blueprint to help you convert your backends -- this initial series converts the following initial backends:

Midgard (chasing helpers)
intel (chasing helpers)
nir_to_tgsi (nir_legacy)
gallivm (direct translation)
zink (direct translation)

After this MR is merged, the remaining backends may be ported in parallel. That will require all of us working together to get NIR there. But once all backends are ported, we can remove nir_register and its helpers, lighten the data structures, improve NIR's memory footprint and compile time, remove a LOT of pointless validation, rename nir_ssa_def to nir_def, remove nir_dest in favour of a nir_def directly, remove abs/neg/sat modifiers, remove write masks, and probably even more :~)

...so I hope you'll help us get there!

Contains !23769 (merged) !23804 (merged)

Edited Jul 12, 2023 by Yonggang Luo

nir: Begin transitioning away from nir_register and source/dest modifiers

Merge request reports