Added new instructions:
getlast.w8 #4
Perform jump for the first (CLUSTER_SIZE-1) fibers in a subgroup.
All other (SUBGROUP_SIZE-CLUSTER_SIZE+1) fibers do NOT jump.
While there is a separate field for CLUSTER_SIZE its value does
not change the behaviour in any observable way, it behaves as if
CLUSTER_SIZE is always 8.
brcst.active.w8
Subgroup could be divided into (subgroup_size / CLUSTER_SIZE)
clusters. For each cluster brcst.active.w would:
Given a cluster of fibers f_0, f_1, ..., f_{CLUSTER_SIZE-1} brcst
would broadcast the SRC value from the fiber f_{CLUSTER_SIZE/2-1}
to fibers f_{CLUSTER_SIZE/2}, ..., f_{CLUSTER_SIZE-1}. The DST reg
in other fibers is unaffected. If fiber f_{CLUSTER_SIZE/2-1} is
inactive the value to broadcast is taken from lower fibers
f_{CLUSTER_SIZE/2-2}, f_{CLUSTER_SIZE/2-3}, ...
If all fibers f_0, f_1, ..., f_{CLUSTER_SIZE-1} are inactive
the DST reg would remain unchanged for all fibers.
It is necessary in order to implement arithmetic subgroup
operations with prefix sum (https://en.wikipedia.org/wiki/Prefix_sum).
For brcst.active.w8 without inactive fibers:
Fiber | 0 1 2 3 4 5 6 7 | 8 9 10 11 12 13 14 15
SRC | s0 s1 s2 s3 ... s7 | s8 ... s11 ... s15
DST_before | d0 d1 ... d7 | d8 ... d15
DST_after | d0 d1 d2 d3 s3 s3 s3 s3 | d8 ... d11 s11 s11 s11 s11
If fibers 2 and 3 are inactive:
Fiber | 0 1 X X 4 5 6 7 | ...
SRC | s0 s1 X X ... s7 | ...
DST_before | d0 d1 ... d7 | ...
DST_after | d0 d1 X X s1 s1 s1 s1 | ...
-
quad_shuffle.brcst
- subgroupQuadBroadcast -
quad_shuffle.horiz
- subgroupQuadSwapHorizontal -
quad_shuffle.vert
- subgroupQuadSwapVertical -
quad_shuffle.diag
- subgroupQuadSwapDiagonal -
getfiberid
- gl_SubgroupID
Implemented:
- Enable subgroup ops in fragment shader
-
GL_KHR_shader_subgroup_quad
(was the easiest one) - Use getfiberid for SubgroupInvocationID
One issue left is:
- Infinite loop in
lower_block
ofir3_lower_subgroups
:
dEQP-VK.subgroups.basic.graphics.subgroupbarrier
instruction list get corrupted after split_block
???