"1-bit" bools on freedreno

    freedreno: Leave bools as 1-bit and (usually) store them in half regs.
    If use NIR's 1-bit bool representation , we get exactly the bool behavior
    the hardware provides: CMPS produces true or false, AND/OR/XOR work as
    intended without extra absnegs, and we can pass those half values directly
    to other CMPS.  We emit an absneg for b2b1 ("turn a memory load into a
    1-bit NIR boolean"), but we would have done so for the ir3_n2b() on the
    use of that value anyway.

    The awkward part is ir3_SEL requiring that the cond match the bit size of
    the selection operands.  If we store all bools as half, we end up with a
    lot of extra upconverts.  Optimize most of them out by storing as full
    when the bool is only used by a 32-bit SEL.  (But we still have to convert
    if the bool gets mixed SEL and non-SEL usage)
    significant changes to GL_TIME_ELAPSED on my set of traces:
    gputest/pixmark-volplosion.trace   :  -14.10% (+/-   0.3%)
    gputest/pixmark-piano.trace        :   -9.93% (+/-   0.2%)
    glmark2/shading:shading=cel.trace  :   -0.70% (+/-   0.3%)
    glmark2/terrain.trace              :   -0.48% (+/-   0.1%)
    instructions in affected programs: 2884196 -> 2847751 (-1.26%)
    nops in affected programs: 1067818 -> 1035885 (-2.99%)
    non-nops in affected programs: 899507 -> 894995 (-0.50%)
    mov in affected programs: 15945 -> 16050 (0.66%)
    cov in affected programs: 11874 -> 13944 (17.43%)
    dwords in affected programs: 510912 -> 497984 (-2.53%)
    last-baryf in affected programs: 210540 -> 204577 (-2.83%)
    full in affected programs: 8722 -> 8848 (1.44%)
    sstall in affected programs: 308623 -> 308127 (-0.16%)
    (ss) in affected programs: 20062 -> 19951 (-0.55%)
    (sy) in affected programs: 1344 -> 1360 (1.19%)
    LOST:   8
    GAINED: 0
    The lost shaders look like huge shaders that might fail RA.
