panfrost: Blend shader optimizations
Now that we have stable infrastructure for emitting blend shaders, here are some optimizations:
- Use a typed load where possible (most UNORM formats)
- Fold round mode into conversion (saves an instruction and usually an ALU cycle from f2u_rte)
- Allow stepping size and type in the same opcode (implements f2u8, usually saving an instruction)