Skip to content

ir3: Decode some of new a7xx instructions

Danylo Piliaiev requested to merge Danil/mesa:freedreno/feature/a7xx into main

These should cover majority of unknown instructions in a7xx shaders.

  • (last) attribute for GPR sources, indicates that this is the last usage of the value in this reg. Seem to be only a (perf?) hint, doesn't affect result in any way from our tests.
or.b r0.x, (last)r0.x, (last)r0.y
  • lock/unlock at the end of all compute shaders. Don't know what they do, always follow the pattern:
%shader_assmebly%
 lock
 unlock
 end
  • New stg.a/ldg.a addressing format (no more shifts):
ldg.a.f32 r4.y, g[c0.z+r4.y+2], 4
stg.a.f32 g[r0.z+r1.w+255], r0.w, 4;
  • New stsc instruction which seem to be STore Shared Consts
    • Loads SIZE dwords from HLSQ_SHARED_CONSTS_IMM starting from HLSQ_SHARED_CONSTS_IMM[SRC] and writing them to c[DST]
stsc.f32 c[0], 0, 12
stsc.f32 c[16], 16, 16;
  • New alias instruction, kind-of cheap move. Creates an entry in scope-specific "alias table" which has priority when instruction reads from its sources:
  0[00000001_00000000] nop ;
  1[e45401a0_bfba7737] alias.tex.b32.1 r40.x, (-1.456763);
  2[e45400a1_3d68405c] alias.tex.b32.0 r40.y, (0.056702);
  3[a4481f00_c0200141] gather4g.s2en.mode6.base0 (f32)(xyzw)r0.x, r40.x, 1;
  4[00010002_00000000] (eq)nop ;
  5[03000000_00000000] end ;

Which on a6xx looked like:

:1:0002:0002[20444000x_bfba7737x] mov.f32f32 r0.x, (-1.456763)
:1:0003:0003[20444001x_3d68405cx] mov.f32f32 r0.y, (0.056702)
:0:0004:0004[00000500x_00000000x] (rpt5)nop
:5:0005:0010[a4481f00x_c0200001x] gather4g.base0 (f32)(xyzw)r0.x, r0.x, s#1, t#0

Another a7xx example:

222[00001500_00000000] (ss)(rpt5)nop ;
223[e44c0005_00000005] alias.tex.b32.0 r1.y, c1.y;
224[a0081fba_c000000b] isam.s2en.mode6.base0.1d (f32)(xyzw)r46.z, r1.y, 0;
225[d02202ba_05677b00] (sy)stib.f32.2d.4.mode4.base0 r46.z, r1.y, 1;

Anyway:

  • Could "alias" const regs, gprs and immediates;
  • Doesn't require nops before the dst reg is used by the next instruction;
  • Has weird .0, .1 ... .15 suffixes which I wasn't able to decipher;
  • Could be .tex, .mem (?), .rt (rendertarget, blob has it disabled on a740).

Not yet reversed:

  • movs (actually exists since a6xx gen3), probably MOV Shared, compared to ordinary mov has lane id:
[0x201540c080800000] movs.s32s32 sr48.x, r0.x, 1;

@PixelyIon has some work done on it, we just need to find out how exactly it works.

  • ray_intersection:
626[c3800404_5514c001] ray_intersection r1.x, [r20.w], r0.x, r21.y, r0.z;
  • resbase instruction:
0[c0260204_00630100] resbase.u32.1d.mode4.base0 r1.x, 1;
  • New? forms of sampling e.g.:
9[a00c3704_c2040141] isam.v.s2en.mode6.base0 (u32)(xyzO)r1.x, 16;

There are more of them.

Edited by Danylo Piliaiev

Merge request reports