aco: some extract() and other optimizations for 16bit

Daniel Schürmann requested to merge daniel-schuermann/mesa:aco_extract into main

The general idea of this MR is to better re-use packed vec2fp16 values as v1 ssa, and access them via swizzles and SDWA. This can avoid lots of copies, especially when more aggressive vectorization is in place.

