WIP: nir+ir3: address calculation optimizations
VERY WIP
The basic idea is that a lot of hardware have single instruction 24b imul, but not necessarily single instruction 32b or larger imul. And adreno goes a step further w/ a 24b mad.s24
. In a lot of cases this is a sufficient number of bits to calculate offsets within arrays/ubo/ssbo.
At this point, the approach is to introduce an amad
instruction ("address mad", feel free to suggest better naming, but I wanted to differentiate from normal "i" instructions) which could be lowered to either imul
+ iadd
, or imads24_ir3
. I initially thought about just doing a shader variant in the rare case that a large SSBO is used. Although I think it would be ok just to decide which instructions to use at compile time based on what the compiler knows about object sizes.
In the current state, it doesn't handle SSBOs since those go thru a different path for offset calculation. But the results on the closed shader-db archive is promising:
total instructions in shared programs: 8693021 -> 8666696 (-0.30%)
instructions in affected programs: 178290 -> 151965 (-14.77%)
helped: 746
HURT: 24
(the HURT shaders are ones where we probably need some better range analysis based optimization, because the pass that lowers ubo to uniform sees a sequence of instructions where it doesn't realize there is a const offset that could be pulled out and folded into the load_uniform
instruction)
That all said, due to the number of places where array deref lowering happens (look for callers of nir_imul_imm()
), and not wanting to plumb type_size()
all over, I'm thinking to shift the approach to:
- add
amul
instruction (instead ofamad
), and use that everywhere for deref calculations - add a pass to try to figure out what
amul
s are involved in dereferencing, and convert to eitherimul
orimul24
depending on size of derefenced object - use opt-algebraic rules to convert
iadd
+imul24
toimad24_ir3
Not 100% sure about how step 2 will work out. But probably starting with 'amul' instead of 'amad' is a better idea either way.
Suggestions welcome
CC: @elima, @jekstrand