ac/nir: Lower large indirect variables to scratch
This doesn't touch the existing indirect lowering in ac_nir_lower_indirect_derefs
, which still seems necessary in some cases due to LLVM bugs, but it does enable larger variables to be lowered to scratch by putting them in function-local memory in LLVM. In particular this fixes a gfxbench5 shader which has a large array which we generated really awful code for.
The threshold here is measured in bytes, instead of vec4 elements as in radeonsi. I chose the number of bytes corresponding to the current limit (16 vec4 elements) assuming that all four components are used. In the case of scalar arrays this means that we're a little less aggressive with putting things in memory. I also tried leaving the limit at 16 scalar 32-bit elements, but that causes more scratch usage (even though max-waves is decreased).