nir: improve convert_yuv_to_rgb
Use a different arrangement of constants to allow more ffma.
A vec4 backend will now use 3 ffma for yuv_to_rgb.
On freedreno/ir3, it is down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends shouldn't be hurt.
I tested with lower_ffma on freedreno to get an idea for GPUs without fma:
- Number of constants is the same
- Number of alu instructions is the same (7 add and 5 mul)
- Lower register pressure
- Arguably worse scheduling (less optimal placement of sampling instructions)