aco/gfx10+: set lateKill for sgprs used by wave64 VALU writing a mask
RDNA2 ISA doc, 6.2.4. Wave64 Destination Restrictions:
The first pass of a wave64 VALU instruction may not overwrite a scalar value used by the second half.
Foz-DB Navi31:
Totals from 5221 (6.58% of 79395) affected shaders:
Instrs: 9751484 -> 9752179 (+0.01%); split: -0.01%, +0.01%
CodeSize: 50624072 -> 50626088 (+0.00%); split: -0.00%, +0.01%
Latency: 85646450 -> 85647419 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 15039160 -> 15039277 (+0.00%); split: -0.00%, +0.00%
VClause: 200275 -> 200204 (-0.04%)
SClause: 248645 -> 248607 (-0.02%); split: -0.03%, +0.01%
Copies: 640802 -> 641413 (+0.10%); split: -0.01%, +0.11%
PreSGPRs: 236297 -> 236735 (+0.19%)
VALU: 5666449 -> 5666440 (-0.00%)
SALU: 967482 -> 968111 (+0.07%); split: -0.01%, +0.07%