nir: Separate control and memory barriers
This MR started off as an attempt to fix #2138 (closed) which led me to notice discrepancy between GLSL's
barrier() intrinsic and OpControlBarrier in SPIR-V which confuses Intel's back-ends and ultimately leads to a hang. In GLSL the
barrier() intrinsic is both a sort of semaphore wait operation which synchronizes execution as well as a memory barrier. For compute shaders, it's a shared memory barrier. For tessellation control shaders it's a memory barrier on patch outputs. In SPIR-V, OpControlBarrier is a control barrier and only optionally a memory barrier. For compute shaders, if you want a shared memory barrier, you have to request it explicitly. Weirdly, it's always a TCS patch output memory barrier even if you don't explicitly request it.
This MR attempts to sort this all out by doing a few things:
- Fix the shared memory barrier issue on Intel in a back-portable way so that we get the gen11 hangs fixed.
- Add a new
nir_intrinsic_memory_barrier_tcs_patchintrinsic type which is like the other
memory_barrier_*intrinsics only it acts on TCS patch outputs. For all drivers this is currently a no-op (though we may want it for real on Intel).
- Make both SPIR-V and GLSL emit separate memory and control barrier intrinsics. For compute, they emit a
memory_barrier_sharedand for TCS, they emit a
memory_barrier_tcs_patch. (SPIR-V will emit scoped barriers instead when requested.) The memory barrier is emitted before the control barrier to ensure that all threads have flushed their caches or I/O queues before any post-barrier IO occurs. For SPIR-V, the shared memory barrier technically may not be required but older versions of GLSLang give us the wrong SPIR-V so I added a workaround for it.
control_barrierand better document that it's only a control barrier and doesn't also require the driver to insert a memory barrier.
The big question here is whether or not others think that they get real value from having the shared memory barrier baked into
nir_intrinsic_barrier. Another possible option would be to add a different control barrier intrinsic similar to
nir_intrinsic_scoped_barrier which takes memory scope information. However, I think that this is still over-all an improvement since it separates control barriers from memory barriers.