vulkan/bvh/ploc: Use parallel prefix scan on aggregate_sums
This patch is meant to accelerate the naive implementation that fixes the miscalculation of aggregate_sums when PLOC_WORKGROUP_SIZE
> SUBGROUP_SIZE
^2.
Also, added a PLOC_SUBGROUPS_PER_WORKGROUP
macro per suggestions by Konstantin.
based on !28446 (closed)