These should cover majority of unknown instructions in a7xx shaders.
-
(last)
attribute for GPR sources, indicates that this is the last usage of the value in this reg. Seem to be only a (perf?) hint, doesn't affect result in any way from our tests.
or.b r0.x, (last)r0.x, (last)r0.y
-
lock
/unlock
at the end of all compute shaders. Don't know what they do, always follow the pattern:
%shader_assmebly%
lock
unlock
end
- New
stg.a
/ldg.a
addressing format (no more shifts):
ldg.a.f32 r4.y, g[c0.z+r4.y+2], 4
stg.a.f32 g[r0.z+r1.w+255], r0.w, 4;
- New
stsc
instruction which seem to beSTore Shared Consts
- Loads SIZE dwords from
HLSQ_SHARED_CONSTS_IMM
starting fromHLSQ_SHARED_CONSTS_IMM[SRC]
and writing them toc[DST]
- Loads SIZE dwords from
stsc.f32 c[0], 0, 12
stsc.f32 c[16], 16, 16;
- New
alias
instruction, kind-of cheap move. Creates an entry in scope-specific "alias table" which has priority when instruction reads from its sources:
0[00000001_00000000] nop ;
1[e45401a0_bfba7737] alias.tex.b32.1 r40.x, (-1.456763);
2[e45400a1_3d68405c] alias.tex.b32.0 r40.y, (0.056702);
3[a4481f00_c0200141] gather4g.s2en.mode6.base0 (f32)(xyzw)r0.x, r40.x, 1;
4[00010002_00000000] (eq)nop ;
5[03000000_00000000] end ;
Which on a6xx looked like:
:1:0002:0002[20444000x_bfba7737x] mov.f32f32 r0.x, (-1.456763)
:1:0003:0003[20444001x_3d68405cx] mov.f32f32 r0.y, (0.056702)
:0:0004:0004[00000500x_00000000x] (rpt5)nop
:5:0005:0010[a4481f00x_c0200001x] gather4g.base0 (f32)(xyzw)r0.x, r0.x, s#1, t#0
Another a7xx example:
222[00001500_00000000] (ss)(rpt5)nop ;
223[e44c0005_00000005] alias.tex.b32.0 r1.y, c1.y;
224[a0081fba_c000000b] isam.s2en.mode6.base0.1d (f32)(xyzw)r46.z, r1.y, 0;
225[d02202ba_05677b00] (sy)stib.f32.2d.4.mode4.base0 r46.z, r1.y, 1;
Anyway:
- Could "alias" const regs, gprs and immediates;
- Doesn't require nops before the dst reg is used by the next instruction;
- Has weird
.0
,.1
....15
suffixes which I wasn't able to decipher; - Could be
.tex
,.mem
(?),.rt
(rendertarget, blob has it disabled on a740).
Not yet reversed:
-
movs
(actually exists since a6xx gen3), probably MOV Shared, compared to ordinary mov has lane id:
[0x201540c080800000] movs.s32s32 sr48.x, r0.x, 1;
@PixelyIon has some work done on it, we just need to find out how exactly it works.
-
ray_intersection
:
626[c3800404_5514c001] ray_intersection r1.x, [r20.w], r0.x, r21.y, r0.z;
-
resbase
instruction:
0[c0260204_00630100] resbase.u32.1d.mode4.base0 r1.x, 1;
- New? forms of sampling e.g.:
9[a00c3704_c2040141] isam.v.s2en.mode6.base0 (u32)(xyzO)r1.x, 16;
There are more of them.