Dual 16 mode

GC3000/GC7000Lite have a "dual 16" mode for pixel shaders, which uses 16-bit float values and runs ALU instructions twice as fast. We need to wait for mediump support in mesa (coming for freedreno) but for now we have a patch to force enable it.

By default the GC3000 blob is not using DUAL16 mode. We can force the DUAL16 mode on with the blob using the following ENV var: VC_OPTION=-DUAL16:2 (2 = force on, 1 = detect). For the shading:shading=cel scene in glmark2, this boosts the FPS from ~240 to ~360.

To enable DUAL16 mode:

GC3000: set 0x20000000 bit in VS_UNIFORM_CACHE
GC7000Lite: set DUAL16 bit in SH_CONFIG

Other notes:

In dual-16 mode, two pixels are being processed at once
th (high precision/32bit) registers share storage with t (medium precision/16bit) registers: t is 8x (4 for each "thread"/pixel) 16-bit values and 'th' is 4x 32-bit values. When mixing highp/mediump the SEL bits determine which half of the mediump registers (which of the two pixels) is used for the instruction. There is a bit to use th for the dest (bit_3_31)
We can use "new immediates" in DUAL16 mode: for 16-bit values (float or int), use amode=6 and encode the 16-bit value directly into the 20 bits of storage for the imm value. 32-bit (20-bit) immediates might work, but only if SEL bits are used (to be tested)?.
Set 0x01000000 in PS_INPUT_COUNT to get highp gl_FragCoord (in th0/th1). Otherwise it doesn't seem like gl_FragCoord works at all
TODO: how do we control if varyings are highp or not? DUAL16 bit in PS_INPUT_COUNT might be related. Highp varyings take 2x the register space.

Edited Jun 26, 2019 by Jonathan Marek