Fix: Properly calculate aggregate_sums in ploc shader
This patch fixes the miscalculation of aggregate_sums[]
when SUBGROUP_SIZE
is less than 32.
Simply using subgroupExclusiveAdd
is not enough when the length of aggregate_sums[]
is larger than SUBGROUP_SIZE
. We need extra stage on top of the result. For simplicity, this patch performs a trivial prefix_sum again and shift these subBlocks: aggregate_sums[0~7]
, [8~15]
, [16~23]
,..., by proper values (assuming SUBGROUP_SIZE
=8), so that eventually aggregate_sums[]
is monotonically increasing.
This patch has passed cts test with SUBGROUP_SIZE
=8/16/32 on intel HW.