Skip to content

tu, ir3, nir: Support vulkan subgroup ops

Connor Abbott requested to merge cwabbott0/mesa:review/tu-ir3-subgroups into main

This MR implements the basic, vote, and ballot Vulkan subgroup capabilities. I believe the HW supports more, but I need to figure out how to update my SM8250 so that I can get traces of the driver actually using it. For now, this implements what I've figured out.

First off, the primitives that I know about include:

  • brany/brall. These are branch instructions that branch if any or all of the active threads have a true predicate. In other words, similar to subgroupVoteAny() and subgroupVoteAll(), but implemented as a branch.
  • getone. A branche where all threads take the branch except the lowest active one. Again, similar to subgroupElect().
  • Shared regs (formerly high regs). It seems that writing them is only possible when exactly one thread is active. So, inside a getone block or something similar. So, subgroupBroadcastFirst() is implemented something like:
{
    int shared_reg;
    if (subgroupElect()) { // getone
        shared_reg = val;
    }
    return shared_reg;
}
  • movmsk. This is equivalent to subgroupBallot(true) and returns its result in a shared reg. We need to use a trick similar to subgroupBroadcastFirst to implement ballot with an arbitrary condition.

Maddeningly, one missing primitive is something to get gl_SubgroupID. In the blob this is just gl_LocalInvocationIndex & 63 (they use 64-thread wavefronts whenever subgroup instructions are used), and subgroup operations aren't supported on anything except for compute. This is required even for the basic capabilities, so we have to follow suit. I've decided to keep the 128-thread setting as it should be more performant, although we may have to revisit that if we have to support the legacy subgroup extensions.

I think it should be clear that the shared register stuff can't be done in NIR. But there are a bunch of lowerings that can be, so this series also includes a bunch of NIR patches to make nir_lower_subgroups more amenable for us. Then ir3 patches to enable us to emit if-then statements when translating from NIR, beef up shared register handling, and handle the new branches. Finally we then enable the primitives.

Merge request reports