winsys/amdgpu: use AMDGPU_IB_FLAG_PREAMBLE for the CS preamble

This skips the preamble for following IBs if the queue receives IBs from
the same context back-to-back. This eliminates VGT_FLUSH (for tess and
legacy GS) and PS_PARTIAL_FLUSH (for gfx11) in those cases if the preamble
contains them.
