Skip to content

ir3: LDG,STG allow immidiate offset with register offset and with variable shift

The full form for ldg/stg offset is:

 g[reg_address + reg_offset << (imm_shift + 2) + imm_offset << 2]

where imm_shift is in [0, 3] range and imm_offset is in [0, 3] range.

a6xx blob was found to produce a bit simpler offset calculations for TES/TCS shaders in GTA V:

 [c002000a_03c14215] ldg.a.f32 r2.z, g[r1.y+((r2.z+1)<<2)], 3;
 [c0020004_01c14609] ldg.a.f32 r1.x, g[r1.y+((r1.x+3)<<2)], 1;

However I wasn't able see shift other than 2 anywhere.

Our new syntax is:

 stg.u32 g[r2.x+(r1.x+1)<<2], r5.x, 1
 stg.u32 g[r2.x+r1.x<<4+3<<2], r5.x, 1
 ldg.f32 r1.w, g[r1.y+(r1.w+1)<<2], 3
 ldg.f32 r1.w, g[r1.y+r1.w<<5+2<<2], 3

Also refactored stg registers order.


Now stg/ldg calls are rather ugly...


There is also computerator changes to quickly test the new offset calculation.
For example such assembly could be used for testing:

@localsize 32, 1, 1
@buf 32(c2.x)  ; g[0]
@const(c0.x)  0.0, 0.0, 0.0, 0.0
@wgid(r48.x)        ; r48.xyz
@invocationid(r0.x) ; r0.xyz
mov.u32u32 r0.y, r0.x
mov.u32u32 r1.x, c2.x
mov.u32u32 r1.y, c2.y
mov.u32u32 r1.z, 3
mov.u32u32 r2.x, 66
(rpt5)nop
stg.u32 g[r1.x+r1.z<<5+3<<2], r2.x, 1
nop(ss)(sy)
ldg.u32 r4.x, g[r1.x+r1.z<<5+3<<2], 1
nop(ss)(sy)
stg.u32 g[r1.x], r4.x, 1
end
nop
Edited by Danylo Piliaiev

Merge request reports