We can't generally fold a 32-bit float conversion into the generating 16-bit ALU op, because we don't know if the f2f16 from OpQuantize has already been folded into it. Fixes most of the failing opquantize tests.
The shader-db stats look pretty trivial:
total instructions in shared programs: 11889317 -> 11889304 (<.01%) instructions in affected programs: 13376 -> 13363 (-0.10%) total nops in shared programs: 3877890 -> 3877877 (<.01%) nops in affected programs: 4113 -> 4100 (-0.32%) total dwords in shared programs: 17893496 -> 17893506 (<.01%) dwords in affected programs: 18822 -> 18832 (0.05%) total full in shared programs: 420421 -> 420422 (<.01%) full in affected programs: 8 -> 9 (12.50%) total sstall in shared programs: 928726 -> 928664 (<.01%) sstall in affected programs: 495 -> 433 (-12.53%)
all changes seem to be some noise in non-GLES (so no fp16) shaders, and I a quick skim of some optmsgs didn't give me a clue as to what changed, but it's not about covs, just some sort of change in instruction scheduling.
Closes: #3208 (closed)