WIP: freedreno/ir3: vector(ish) prep
adreno a3xx+ (ir3) has a sorta vector mode, using a (rptN)
(for alu instructions) to repeat an instruction 1-3 times. The destination register increments (to the next successive scalar register) for each repeat, and src registers with the (r)
flag increment. Meaning that srcs can have either .xxx
or .xyz
swizzles. The notable benefit of this mode is that src registers without (r)
get loaded a single time, rather than once per instruction. (And I think it helps the shader core pipeline to better prefetch src registers.) This turns out to help a lot in certain cases (ie. anything other than a bunch of 2src alu ops) where GPR read bandwidth can become the bottleneck.
This MR doesn't turn on vectorizing yet.. RA still needs some more work, plus whatever other bugs I've not found yet. But this is the part of my vectorish patch stack that I think is ready(ish) to get some eyes on.