winsys/amdgpu: use AMDGPU_IB_FLAG_PREAMBLE for the CS preamble
This skips the preamble for following IBs if the queue receives IBs from the same context back-to-back. This eliminates VGT_FLUSH (for tess and legacy GS) and PS_PARTIAL_FLUSH (for gfx11) in those cases if the preamble contains them.