ir3: Add bar.g to beginning of HS with tess_use_shared
This matches the blob. In theory, this is necessary only because the VS/HS workgroup can now span more than one wave and a patch may be assigned to different waves in the VS and HS. However I've seen it fix tests where the entire draw should fit in one wave, so there may some other sort of waiting this does or the HW dispatch may be inefficient sometimes.
Fixes EQP-VK.tessellation.user_defined_io.per_patch.vertex_io_array_size_implicit.* when run immediately after dEQP-VK.tessellation.invariance.outer_triangle_set.quads_fractional_even_spacing or when all of dEQP-VK.tessellation.* is run in sequence on a650.