winsys/amdgpu: use AMDGPU_IB_FLAG_PREAMBLE for the CS preamble

This skips the preamble for following IBs if the queue receives IBs from
the same context back-to-back. This eliminates VGT_FLUSH (for tess and
legacy GS) and PS_PARTIAL_FLUSH (for gfx11) in those cases if the preamble
contains them.
196 jobs for !16509 with gfx11-first-fixes in 14 seconds (queued for 16 seconds)
merge request