Dual 16 mode
GC3000/GC7000Lite have a "dual 16" mode for pixel shaders, which uses 16-bit float values and runs ALU instructions twice as fast. We need to wait for mediump support in mesa (coming for freedreno) but for now we have a patch to force enable it.
By default the GC3000 blob is not using DUAL16 mode. We can force the DUAL16 mode on with the blob using the following ENV var: VC_OPTION=-DUAL16:2 (2 = force on, 1 = detect). For the shading:shading=cel scene in glmark2, this boosts the FPS from ~240 to ~360.
To enable DUAL16 mode:
- GC3000: set 0x20000000 bit in VS_UNIFORM_CACHE
- GC7000Lite: set DUAL16 bit in SH_CONFIG
- In dual-16 mode, two pixels are being processed at once
th(high precision/32bit) registers share storage with
t(medium precision/16bit) registers:
tis 8x (4 for each "thread"/pixel) 16-bit values and 'th' is 4x 32-bit values. When mixing highp/mediump the SEL bits determine which half of the mediump registers (which of the two pixels) is used for the instruction. There is a bit to use
thfor the dest (bit_3_31)
- We can use "new immediates" in DUAL16 mode: for 16-bit values (float or int), use amode=6 and encode the 16-bit value directly into the 20 bits of storage for the imm value. 32-bit (20-bit) immediates might work, but only if SEL bits are used (to be tested)?.
- Set 0x01000000 in PS_INPUT_COUNT to get highp gl_FragCoord (in th0/th1). Otherwise it doesn't seem like gl_FragCoord works at all
- TODO: how do we control if varyings are highp or not?
DUAL16bit in PS_INPUT_COUNT might be related. Highp varyings take 2x the register space.