Skip to content

Fix: Properly calculate aggregate_sums in ploc shader

This patch fixes the miscalculation of aggregate_sums[] when SUBGROUP_SIZE is less than 32.

Simply using subgroupExclusiveAdd is not enough when the length of aggregate_sums[] is larger than SUBGROUP_SIZE. We need extra stage on top of the result. For simplicity, this patch performs a trivial prefix_sum again and shift these subBlocks: aggregate_sums[0~7], [8~15], [16~23],..., by proper values (assuming SUBGROUP_SIZE=8), so that eventually aggregate_sums[] is monotonically increasing.

This patch has passed cts test with SUBGROUP_SIZE=8/16/32 on intel HW.

Merge request reports

Loading