pan/va: Handle terminal barriers

If a shader ends with a workgroup barrier, it must wait for slot #7 at the end
to finish the barrier. After inserting flow control, we get:

   BARRIER
   NOP.wait
   NOP.end

Currently, the flow control pass assumes that .end implies all other control
flow, and will merge this down to

   BARRIER.end

However, this is incorrect. Slot #7 is no longer waited on. In theory, this
cannot affect the correctness of the shader. In practice, the hardware checks
that all barriers are reached. Terminating without waiting on slot #7 first
raises an INSTR_BARRIER_FAULT. We need to weaken the flow control merging
slightly to avoid this incorrect merge, instead emitting:

   BARRIER.wait
   NOP.end

Of course, all of these cases are inefficient: terminal barriers shouldn't be
emitted in the first place. I wrote out an optimization for this. We can merge
it if we find a workload that it actually helps.

Fixes test_half.vstore_half.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <!17264>
44 jobs for !17264 with bi/terminal-barriers in 15 minutes and 42 seconds (queued for 12 seconds)
latest merge request