agx: Lower UBO loads to use per-element indexing
This lets us support indirect access to UBOs easily. The existing constant special case disappears too, since the peephole optimizer can inline the constant later. (note: this is too conservative since we can go up to 16-bit immediates...)
Unfortunately, nir_opt_algebraic can't seem to optimize expressions like "((a << 3) + 4) >> 2" to "(a << 1) + 1" which would be necessary for reasonable perf out of this...
Fixes:
dEQP-GLES2.functional.shaders.indexing.uniform_array.float_dynamic_loop_read_fragment
Cc @jekstrand - is there a way to make nir_lower_io do this efficiently? I heard your next gig is writing open source Apple M1 GPU drivers at Google, so...