This is a proposal for an option that lowers 2-src vector_cmp ops (e.g.
bany_nequal3) into 1-src vector all/any + cmp (e.g.
This helps simplify code in backends that:
Have other vector_cmp instructions and currently have to write a optpass or something similar to fold things like
ball_flt. Panfrost is a good example of this.
Don't have vector_cmp instructions, but have
any/all. In this case the backend has to lower these instructions. Zink is a good example of this.
I assume it can also be helpful to simplify code in other backends, and maybe we can even get rid of the 2-src opcodes altogether later on if that's the case.
This patch does three things:
(f/b)anyN, which check if all/at-least-one of the components are True.
Adds support for lowering existing
(f/b)any_(n)equalNopcodes into the new opcodes through
Keeps the ability to lower the opcodes all the way to
fmax/fmin+seq/snefor the float versions) through
lower_vector_cmp. With this implementation,
lower_2src_vector_cmp, meaning that you don't need to use both compiler flags to get everything lowered to simpler alu ops.
Here's some examples of NIR code generated with this patch:
GLSL code: (both variables are of type
v1 == v2
lower_vector_cmp after the patches:
vec4 32 ssa_12 = ieq32 ssa_10, ssa_11 vec2 32 ssa_13 = iand ssa_12.xz, ssa_12.yw vec1 32 ssa_14 = iand ssa_13.x, ssa_13.y
lower_2src_vector_cmp after the patches:
vec4 32 ssa_13 = ieq32 ssa_11, ssa_12 vec1 32 ssa_14 = b32all4 ssa_13
Generated without lowering the 2src ops after the patches:
vec1 32 ssa_13 = b32all_iequal4 ssa_11, ssa_12