nir/pack_bits: handle 8-bit vec8 -> 64-bit
What does this MR do and why?
nir/pack_bits: handle 8-bit vec8 -> 64-bit
This is a very silly case, but there's no reason not to handle it efficiently,
and this implementation is faster than the fallback. Noticed when playing with
scratch optimizaitons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>