Skip to content

nir, radv: Optimize boolean reduce intrinsics

Connor Abbott requested to merge cwabbott0/mesa:radv-bool-reduce into main

This series enables clustered reduce operations on radv, and adds a fast path for reductions on booleans. We'd really rather not enable WWM yet unless it's really necessary, since it currently increases register pressure quite a bit. Also, for and/or reductions with a cluster size of 4, we can do it in a single instruction (S_WQM_B64).

We don't have 1-bit booleans passed all the way to the backend yet, and this is something that Intel will probably want too, so the lowering is implemented in NIR, in nir_lower_subgroups.

The original motivation was DXVK and VKD3D, which use reductions with a cluster size of 4 to implement D3D discard semantics. Hopefully performance should be improved with this series.

The last patch is dependent on https://reviews.llvm.org/D57748. In addition, I stumbled on an LLVM bug while implementing this: https://reviews.llvm.org/D57894. I can do without the last patch, since it's just an optimization in a path not used by DXVK anyways, but if we land this now, there will be some new failed tests until the LLVM fix lands. Of course, LLVM 8 will also regress unless we backport the fix.

Merge request reports