aco: optimize fneg_lo / fneg_hi
Found in the FSR upscaling shaders.
As Radeon seems to be the only hardware which supports per-component fneg modifiers on packed fp16 instructions, instead of adding new opcodes for this rare case, the idea is to just lower to fmul(vec2(-1,1), x)
. We would use the same sequence in the instruction-selection, anyways. Other drivers shouldn't really be affected by this pattern as it's rare enough and without per-component modifiers, there isn't really a better way, neither.
A patch is added to fold fp16_vec2 literals in these cases:
- lo == hi
- lo == 0 (only hi gets emitted, and lo can use the upper zero-bits)
- lo == -hi (only the positive value is emitted, and fneg modifier is being used)