aco: implement fp16 arithmetic operations
Currently, the plan is to emit plain 16-bit instructions (without SDWA but v2b ops/defs) in isel and convert them later in RA or in the optimizer if partial writes are needed.
All fp16 arithmetic CTS pass on GFX8, GFX9 and GFX10. Note that on GFX8-GFX9, the hardware doesn't do partial writes and it overwrites the upper half. Not sure if we want to expose it on GFX8 (or we need to extract the lower half).
This MR only implements fp16 arithmetic operations and doesn't enable any new Vulkan extensions.