Implement `vkCmdCopyQueryPoolResults` with a compute shader
This is required pre-Turing as the MME is simply not powerful enough there to implement it entirely in the MME. It's probably also faster on Turing to fire off a compute shader if we have more than a handful of query results to copy.