Skip to content

ir3: Use scalar ALU

Connor Abbott requested to merge cwabbott0/mesa:review/ir3-use-scalar-alu into main

This giant series has the goal of enabling us to use the "scalar ALU" added on a650. This is similar to SALU on AMD, except it's actually even more extensive in what it covers. While the blob seems to only use the scalar ALU in preambles, there seem to be some cases where we can still profitably use it outside of that, for example when const space is overflowed or for calculations based on a loop induction variable. For this we need to use divergence analysis in NIR, which is already used for a similar purpose in ACO. I've tried to split it into a number of pieces:

  • Handle half shared registers, which we'll have to start using.
  • Fixes for divergence analysis and handling for ir3-specific intrinsics
  • Use divergence analysis in the reconvergence pass, which will be important later to avoid having to remove all shared phis
  • Plumb scalar ALU support through the compiler
  • Support scalar ALU in the builder
  • Actually flip it on

This is mostly ready for review, but it needs perf testing because there are a lot of regressions where fewer or equal vector ALU instructions are traded for more total instructions, and we might need to tweak some heuristics if that turns out to cause an actual regression.

I've tried to split out the parts that are actually bugfixes into !22072 (merged), even if those bugfixes are for cases that no test or app have hit before scalar ALU was enabled, so we can land that first. However in the end it was impossible to completely untangle the two, so there's some "useless" code there which finally gets used here.

Note that there's a closely related "early preamble" feature, enabled in !27462 based on this MR.

Edited by Connor Abbott

Merge request reports