nir: add derivative intrinsics
Derivative ops exist in the "uncanny valley" between ALU and intrinsics. They involve ALU on typical hardware due to the subtraction operation, but they are intrinsic like because of their cross-lane communication. In fact, they are closest to quad scans/reductions than any ALU op. Every real ALU op is a pure function of its input and is completely described by its constant folding code - derivatives are not. Furthermore, because they rely on helper invocations in fragment shaders, reordering derivatives across discards is not generally safe. Every other ALU op can be reordered without restriction, but derivatives are special because they're not actually ALU.
If NIR were built today, derivatives would be intrinsics like subgroup ops are. NIR was built before subgroup ops were a thing which is why derivatives were grandfathered in to ALU. We really want derivatives to be intrinsics so we can stop special casing them in passes that operate on ALU.
This MR attempts to move us in that direction, introducing intrinsics for derivatives and converting a few passes/drivers. There are a lot of open questions and missing pieces, however:
- How should scalarization of derivatives happen? Mali wants 2x16 derivatives which is extra annoying here. AMD might be similar, need to check.
-
Where should constant and algebraic-like optimizations happen?nir_opt_constant_folding and if needed nir_opt_intrinsics -
Will this cause heartburn for nir_legacy backends needing fneg propagated?No. - Probably more I'm missing.
This isn't a.. clear win, at least not yet. But I do think we're all on the same page that this is where we want to get long term.
Backend tracker:
-
v3d -
vc4 -
agx -
nir-to-tgsi -
midgard -
nak -
brw -
elk -
ir3 -
bifrost -
aco -
amd llvm -
r300 -
r600 -
dxil -
gallivm -
etnaviv -
lima/pp -
zink -
nv50