anv: Optimize vkQueueWaitIdle() on Xe KMD
What does this MR do and why?
This optimization allow us to avoid emission of batch buffers with just the end instruction to check when queue is idle, freeing up GPU to do more interesting stuff.
The first 2 patches are part of WIP branch that I have implementing xe_sync for OA metrics changes(https://patchwork.freedesktop.org/series/137058/) but this optimization is good reason to already send those patches for review.