aco: optimize fneg_lo / fneg_hi (!13688) · Merge requests · Mesa / mesa

Daniel Schürmann requested to merge daniel-schuermann/mesa:aco_fneg into main Nov 05, 2021

Found in the FSR upscaling shaders.

As Radeon seems to be the only hardware which supports per-component fneg modifiers on packed fp16 instructions, instead of adding new opcodes for this rare case, the idea is to just lower to fmul(vec2(-1,1), x). We would use the same sequence in the instruction-selection, anyways. Other drivers shouldn't really be affected by this pattern as it's rare enough and without per-component modifiers, there isn't really a better way, neither.

A patch is added to fold fp16_vec2 literals in these cases:

lo == hi
lo == 0 (only hi gets emitted, and lo can use the upper zero-bits)
lo == -hi (only the positive value is emitted, and fneg modifier is being used)

Admin message

aco: optimize fneg_lo / fneg_hi

Merge request reports