dzn: Enable arithmetic subgroup ops
DXIL has support for all of the reduce
ops, but has limited support for scan ops, which are unfortunately part of the same family in VK/SPIR-V. It only has support for exclusive add/multiply. So, there's a lowering pass in here to address the gap:
- For inclusive scan ops where an exclusive variant exists, we lower to that and then just do one additional alu op with the current thread's value.
- For scan ops where no exclusive variant exists, we write a loop, where for every active lane up to our own we read that lane's value and do the op.
There's nothing DXIL-specific about this lowering pass, except how to describe the conditions of when to lower and what to lower to, so if there's another driver that wants it, we can figure out how to generalize that and move it.
The rest of this is just trivial plumbing.