nir: add 1-src f/ball and f/bany opcodes plus lowering
This is a proposal for an option that lowers 2-src vector_cmp ops (e.g. bany_nequal3
) into 1-src vector all/any + cmp (e.g. bany(neq(a,b))
.
This helps simplify code in backends that:
-
Have other vector_cmp instructions and currently have to write a optpass or something similar to fold things like
ball_equal4(flt(a,b), True)
intoball_flt
. Panfrost is a good example of this. -
Don't have vector_cmp instructions, but have
any/all
. In this case the backend has to lower these instructions. Zink is a good example of this.
I assume it can also be helpful to simplify code in other backends, and maybe we can even get rid of the 2-src opcodes altogether later on if that's the case.
This patch does three things:
-
Adds opcodes
(f/b)allN
and(f/b)anyN
, which check if all/at-least-one of the components are True. -
Adds support for lowering existing
(f/b)all_(n)equalN
and(f/b)any_(n)equalN
opcodes into the new opcodes throughlower_2src_vector_cmp
. -
Keeps the ability to lower the opcodes all the way to
iand/ior+eq/ne
(orfmax/fmin+seq/sne
for the float versions) throughlower_vector_cmp
. With this implementation,lower_vector_cmp
implieslower_2src_vector_cmp
, meaning that you don't need to use both compiler flags to get everything lowered to simpler alu ops.
Here's some examples of NIR code generated with this patch:
GLSL code: (both variables are of type ivec4
)
v1 == v2
Generated using lower_vector_cmp
after the patches:
vec4 32 ssa_12 = ieq32 ssa_10, ssa_11
vec2 32 ssa_13 = iand ssa_12.xz, ssa_12.yw
vec1 32 ssa_14 = iand ssa_13.x, ssa_13.y
Generated using lower_2src_vector_cmp
after the patches:
vec4 32 ssa_13 = ieq32 ssa_11, ssa_12
vec1 32 ssa_14 = b32all4 ssa_13
Generated without lowering the 2src ops after the patches:
vec1 32 ssa_13 = b32all_iequal4 ssa_11, ssa_12