nir: Separate control and memory barriers

This MR started off as an attempt to fix #2138 (closed) which led me to notice discrepancy between GLSL's barrier() intrinsic and OpControlBarrier in SPIR-V which confuses Intel's back-ends and ultimately leads to a hang. In GLSL the barrier() intrinsic is both a sort of semaphore wait operation which synchronizes execution as well as a memory barrier. For compute shaders, it's a shared memory barrier. For tessellation control shaders it's a memory barrier on patch outputs. In SPIR-V, OpControlBarrier is a control barrier and only optionally a memory barrier. For compute shaders, if you want a shared memory barrier, you have to request it explicitly. Weirdly, it's always a TCS patch output memory barrier even if you don't explicitly request it.

This MR attempts to sort this all out by doing a few things:

  1. Fix the shared memory barrier issue on Intel in a back-portable way so that we get the gen11 hangs fixed.
  2. Add a new nir_intrinsic_memory_barrier_tcs_patch intrinsic type which is like the other memory_barrier_* intrinsics only it acts on TCS patch outputs. For all drivers this is currently a no-op (though we may want it for real on Intel).
  3. Make both SPIR-V and GLSL emit separate memory and control barrier intrinsics. For compute, they emit a memory_barrier_shared and for TCS, they emit a memory_barrier_tcs_patch. (SPIR-V will emit scoped barriers instead when requested.) The memory barrier is emitted before the control barrier to ensure that all threads have flushed their caches or I/O queues before any post-barrier IO occurs. For SPIR-V, the shared memory barrier technically may not be required but older versions of GLSLang give us the wrong SPIR-V so I added a workaround for it.
  4. Rename nir_intrinsic_barrier to control_barrier and better document that it's only a control barrier and doesn't also require the driver to insert a memory barrier.

The big question here is whether or not others think that they get real value from having the shared memory barrier baked into nir_intrinsic_barrier. Another possible option would be to add a different control barrier intrinsic similar to nir_intrinsic_scoped_barrier which takes memory scope information. However, I think that this is still over-all an improvement since it separates control barriers from memory barriers.

Merge request reports