ir3,tu: Enable subgroup shuffles and relative shuffles

We still don't use the fast path for relative shuffles, that's left for
future work.