Skip to content

nir: add 1-src f/ball and f/bany opcodes plus lowering

Italo Nicola requested to merge italove/mesa:fold_bcast_nir_3 into main

This is a proposal for an option that lowers 2-src vector_cmp ops (e.g. bany_nequal3) into 1-src vector all/any + cmp (e.g. bany(neq(a,b)).

This helps simplify code in backends that:

  1. Have other vector_cmp instructions and currently have to write a optpass or something similar to fold things like ball_equal4(flt(a,b), True) into ball_flt. Panfrost is a good example of this.

  2. Don't have vector_cmp instructions, but have any/all. In this case the backend has to lower these instructions. Zink is a good example of this.

I assume it can also be helpful to simplify code in other backends, and maybe we can even get rid of the 2-src opcodes altogether later on if that's the case.

This patch does three things:

  1. Adds opcodes (f/b)allN and (f/b)anyN, which check if all/at-least-one of the components are True.

  2. Adds support for lowering existing (f/b)all_(n)equalN and (f/b)any_(n)equalN opcodes into the new opcodes through lower_2src_vector_cmp.

  3. Keeps the ability to lower the opcodes all the way to iand/ior+eq/ne (or fmax/fmin+seq/sne for the float versions) through lower_vector_cmp. With this implementation, lower_vector_cmp implies lower_2src_vector_cmp, meaning that you don't need to use both compiler flags to get everything lowered to simpler alu ops.

Here's some examples of NIR code generated with this patch:

GLSL code: (both variables are of type ivec4)

v1 == v2

Generated using lower_vector_cmp after the patches:

vec4 32 ssa_12 = ieq32 ssa_10, ssa_11
vec2 32 ssa_13 = iand ssa_12.xz, ssa_12.yw
vec1 32 ssa_14 = iand ssa_13.x, ssa_13.y

Generated using lower_2src_vector_cmp after the patches:

vec4 32 ssa_13 = ieq32 ssa_11, ssa_12
vec1 32 ssa_14 = b32all4 ssa_13

Generated without lowering the 2src ops after the patches:

vec1 32 ssa_13 = b32all_iequal4 ssa_11, ssa_12
Edited by Italo Nicola

Merge request reports