Skip to content

ir3,turnip: Add gen4 new subgroup instructions and implement GL_KHR_shader_subgroup_quad

Added new instructions:

  • getlast.w8 #4
Perform jump for the first (CLUSTER_SIZE-1) fibers in a subgroup.
All other (SUBGROUP_SIZE-CLUSTER_SIZE+1) fibers do NOT jump.
While there is a separate field for CLUSTER_SIZE its value does
not change the behaviour in any observable way, it behaves as if
CLUSTER_SIZE is always 8.
  • brcst.active.w8
Subgroup could be divided into (subgroup_size / CLUSTER_SIZE)
clusters. For each cluster brcst.active.w would:

Given a cluster of fibers f_0, f_1, ..., f_{CLUSTER_SIZE-1} brcst
would broadcast the SRC value from the fiber f_{CLUSTER_SIZE/2-1}
to fibers f_{CLUSTER_SIZE/2}, ..., f_{CLUSTER_SIZE-1}. The DST reg
in other fibers is unaffected. If fiber f_{CLUSTER_SIZE/2-1} is
inactive the value to broadcast is taken from lower fibers
f_{CLUSTER_SIZE/2-2}, f_{CLUSTER_SIZE/2-3}, ...
If all fibers f_0, f_1, ..., f_{CLUSTER_SIZE-1} are inactive
the DST reg would remain unchanged for all fibers.

It is necessary in order to implement arithmetic subgroup
operations with prefix sum (https://en.wikipedia.org/wiki/Prefix_sum).

    For brcst.active.w8 without inactive fibers:
    	Fiber      | 0  1  2  3  4  5  6  7  | 8  9  10  11  12  13  14  15
    	SRC        | s0 s1 s2 s3 ...      s7 | s8  ...   s11 ...         s15
    	DST_before | d0 d1       ...      d7 | d8  ...                   d15
    	DST_after  | d0 d1 d2 d3 s3 s3 s3 s3 | d8  ...   d11 s11 s11 s11 s11

    If fibers 2 and 3 are inactive:
    	Fiber      | 0  1  X  X  4  5  6  7  | ...
    	SRC        | s0 s1 X  X  ...      s7 | ...
    	DST_before | d0 d1       ...      d7 | ...
    	DST_after  | d0 d1 X  X  s1 s1 s1 s1 | ...
  • quad_shuffle.brcst - subgroupQuadBroadcast
  • quad_shuffle.horiz - subgroupQuadSwapHorizontal
  • quad_shuffle.vert - subgroupQuadSwapVertical
  • quad_shuffle.diag - subgroupQuadSwapDiagonal
  • getfiberid - gl_SubgroupID

Implemented:

  • Enable subgroup ops in fragment shader
  • GL_KHR_shader_subgroup_quad (was the easiest one)
  • Use getfiberid for SubgroupInvocationID

One issue left is:

  • Infinite loop in lower_block of ir3_lower_subgroups:
dEQP-VK.subgroups.basic.graphics.subgroupbarrier

instruction list get corrupted after split_block ???

Edited by Danylo Piliaiev

Merge request reports