Skip to content

aco: optimize fneg_lo / fneg_hi

Daniel Schürmann requested to merge daniel-schuermann/mesa:aco_fneg into main

Found in the FSR upscaling shaders.

As Radeon seems to be the only hardware which supports per-component fneg modifiers on packed fp16 instructions, instead of adding new opcodes for this rare case, the idea is to just lower to fmul(vec2(-1,1), x). We would use the same sequence in the instruction-selection, anyways. Other drivers shouldn't really be affected by this pattern as it's rare enough and without per-component modifiers, there isn't really a better way, neither.

A patch is added to fold fp16_vec2 literals in these cases:

  • lo == hi
  • lo == 0 (only hi gets emitted, and lo can use the upper zero-bits)
  • lo == -hi (only the positive value is emitted, and fneg modifier is being used)

Merge request reports