"1-bit" bools on freedreno
My project for today is starting to look healthy, so putting it up here. Depends on !4516 (merged) and please review those changes there.
From the final commit:
freedreno: Leave bools as 1-bit and (usually) store them in half regs.
If use NIR's 1-bit bool representation , we get exactly the bool behavior
the hardware provides: CMPS produces true or false, AND/OR/XOR work as
intended without extra absnegs, and we can pass those half values directly
to other CMPS. We emit an absneg for b2b1 ("turn a memory load into a
1-bit NIR boolean"), but we would have done so for the ir3_n2b() on the
use of that value anyway.
The awkward part is ir3_SEL requiring that the cond match the bit size of
the selection operands. If we store all bools as half, we end up with a
lot of extra upconverts. Optimize most of them out by storing as full
when the bool is only used by a 32-bit SEL. (But we still have to convert
if the bool gets mixed SEL and non-SEL usage)
significant changes to GL_TIME_ELAPSED on my set of traces:
gputest/pixmark-volplosion.trace : -14.10% (+/- 0.3%)
gputest/pixmark-piano.trace : -9.93% (+/- 0.2%)
glmark2/shading:shading=cel.trace : -0.70% (+/- 0.3%)
glmark2/terrain.trace : -0.48% (+/- 0.1%)
instructions in affected programs: 2884196 -> 2847751 (-1.26%)
nops in affected programs: 1067818 -> 1035885 (-2.99%)
non-nops in affected programs: 899507 -> 894995 (-0.50%)
mov in affected programs: 15945 -> 16050 (0.66%)
cov in affected programs: 11874 -> 13944 (17.43%)
dwords in affected programs: 510912 -> 497984 (-2.53%)
last-baryf in affected programs: 210540 -> 204577 (-2.83%)
full in affected programs: 8722 -> 8848 (1.44%)
sstall in affected programs: 308623 -> 308127 (-0.16%)
(ss) in affected programs: 20062 -> 19951 (-0.55%)
(sy) in affected programs: 1344 -> 1360 (1.19%)
LOST: 8
GAINED: 0
The lost shaders look like huge shaders that might fail RA.
Edited by Emma Anholt