ir3: LDG,STG allow immidiate offset with register offset and with variable shift
The full form for ldg/stg offset is:
g[reg_address + reg_offset << (imm_shift + 2) + imm_offset << 2]
where imm_shift
is in [0, 3] range and imm_offset
is in [0, 3] range.
a6xx blob was found to produce a bit simpler offset calculations for TES/TCS shaders in GTA V:
[c002000a_03c14215] ldg.a.f32 r2.z, g[r1.y+((r2.z+1)<<2)], 3;
[c0020004_01c14609] ldg.a.f32 r1.x, g[r1.y+((r1.x+3)<<2)], 1;
However I wasn't able see shift other than 2
anywhere.
Our new syntax is:
stg.u32 g[r2.x+(r1.x+1)<<2], r5.x, 1
stg.u32 g[r2.x+r1.x<<4+3<<2], r5.x, 1
ldg.f32 r1.w, g[r1.y+(r1.w+1)<<2], 3
ldg.f32 r1.w, g[r1.y+r1.w<<5+2<<2], 3
Also refactored stg registers order.
Now stg/ldg calls are rather ugly...
There is also computerator changes to quickly test the new offset calculation.
For example such assembly could be used for testing:
@localsize 32, 1, 1
@buf 32(c2.x) ; g[0]
@const(c0.x) 0.0, 0.0, 0.0, 0.0
@wgid(r48.x) ; r48.xyz
@invocationid(r0.x) ; r0.xyz
mov.u32u32 r0.y, r0.x
mov.u32u32 r1.x, c2.x
mov.u32u32 r1.y, c2.y
mov.u32u32 r1.z, 3
mov.u32u32 r2.x, 66
(rpt5)nop
stg.u32 g[r1.x+r1.z<<5+3<<2], r2.x, 1
nop(ss)(sy)
ldg.u32 r4.x, g[r1.x+r1.z<<5+3<<2], 1
nop(ss)(sy)
stg.u32 g[r1.x], r4.x, 1
end
nop