pan/bi: Implement general 8-bit vector construction (for OpenCL)
Refactor the 16-bit vector construction code to separate the collect of separate 32-bit words from the construction of single 32-bit vectors from sub-32-bit parts. Then add a path to combine (up to 4) bytes into a 32-bit word using Valhall's MKVEC.v2i8 instruction.