Investigated for Bifrost, but should be the same or cheaper for any reasonable architecture. For the compilers I maintain:
- Bifrost - fadd can be scheduled 2x as frequently as ffma, there is no fmul separate from ffma.
- Midgard - fadd x, x is used as a canonical form, again for easier scheduling.
- AGX - fmul and fadd are both native ops, but fmul is heavier weight (unknown whether this is a performance issue or just power consumption). Also saves a move / uniform file slot for the constant.
Since floating point multiplication is inherently more expensive than addition, presumably this is a win for everyone else too.
Signed-off-by: Alyssa Rosenzweig email@example.com