dzn: Another set of CTS fixes from running on hardware
The biggest thing in here is the changes for the Vulkan memory model. WARP is x86 where 32-bit loads and stores are already cross-core atomic, so it didn't catch that we weren't doing anything with the coherent / acquire / release semantics. This change promotes acquire/release to coherent via the common lowering pass, and then coherent to either atomic loads/stores if possible, otherwise setting the globally-coherent flag on the relevant resource declaration(s).
Beyond that:
- A couple missing barriers
- Handling binding samplers as non-static for meta (blits) for hardware that can't mix static samplers with bindless
- A debug flag to force off native view instancing - it looks like NVIDIA has a bug there where they get the render target array indices wrong which blows up a bunch of tests