Skip to content

nir, intel: Optimize some global 64-bit address calculations

Kenneth Graunke requested to merge kwg/mesa:late-int64 into main

I was recently asked to look at a microbenchmark that was doing far too many memory accesses on anv compared to other drivers, and discovered that it was due to nir_opt_load_store_vectorize not being able to detect consecutive loads/stores and combine them. This is the first MR to begin correcting that.

The first problem is that we performed int64 lowering prior to nir_opt_load_store_vectorize, so the vectorizer wasn't able to see through basic int64 adds/shifts. Instead, it saw unpack into two halves, add both halves, compare the results, turn the boolean into an integer, add carry bits, re-pack...yikes. To fix that, this moves most of the int64 lowering until later. Unfortunately, the int64-to/from-float conversions interact with nir_lower_doubles, which we do early. So, we add a new "do only the float part" entry-point and use that early, deferring the rest until later. Not the cleanest, but it seems to work well enough.

The second problem was that we were missing an algebraic optimization rule for adds and shifts in the presence of i2i64 conversions.

There is another pattern of poor access exposed by those shaders, but my patches for that are still a bit buggy, so I'm leaving that for a future MR.

+@idr @cmarcelo

Merge request reports