Skip to content

WIP: nir+ir3: address calculation optimizations

Rob Clark requested to merge robclark/mesa:wip/addr-calc into master


The basic idea is that a lot of hardware have single instruction 24b imul, but not necessarily single instruction 32b or larger imul. And adreno goes a step further w/ a 24b mad.s24. In a lot of cases this is a sufficient number of bits to calculate offsets within arrays/ubo/ssbo.

At this point, the approach is to introduce an amad instruction ("address mad", feel free to suggest better naming, but I wanted to differentiate from normal "i" instructions) which could be lowered to either imul + iadd, or imads24_ir3. I initially thought about just doing a shader variant in the rare case that a large SSBO is used. Although I think it would be ok just to decide which instructions to use at compile time based on what the compiler knows about object sizes.

In the current state, it doesn't handle SSBOs since those go thru a different path for offset calculation. But the results on the closed shader-db archive is promising:

total instructions in shared programs: 8693021 -> 8666696 (-0.30%)
instructions in affected programs: 178290 -> 151965 (-14.77%)
helped: 746
HURT: 24

(the HURT shaders are ones where we probably need some better range analysis based optimization, because the pass that lowers ubo to uniform sees a sequence of instructions where it doesn't realize there is a const offset that could be pulled out and folded into the load_uniform instruction)

That all said, due to the number of places where array deref lowering happens (look for callers of nir_imul_imm()), and not wanting to plumb type_size() all over, I'm thinking to shift the approach to:

  1. add amul instruction (instead of amad), and use that everywhere for deref calculations
  2. add a pass to try to figure out what amuls are involved in dereferencing, and convert to either imul or imul24 depending on size of derefenced object
  3. use opt-algebraic rules to convert iadd + imul24 to imad24_ir3

Not 100% sure about how step 2 will work out. But probably starting with 'amul' instead of 'amad' is a better idea either way.

Suggestions welcome

CC: @elima, @jekstrand

Merge request reports