Skip to content

v3d: use larger supergroups when possible

Iago Toral requested to merge itoral/mesa:v3d_compute_wgs_per_sg into master

The V3D hardware allows us to pack multiple workgroups together to avoid wasting execution lanes in shader cores.

For example, if we dispatch 16 workgroups with a local size of 1 element, we can pack all 16 workgroups in a single 16-wide dispatch where each lane executes a different workgroup, instead of 16 1-wide dispatches.

This improves the perfomance of the Sascha Willems computecloth demo, which has a 10x10x1 local size, by 13%. This because without this, we have a single workgroup per supergroup, so each supergroup has 100 elements that we handle as 6x16-wide dispatches and 1x4-wide dispatch (wasting 12 lanes on the last dispatch of each workgroup). With this series, we pack 4 workgroups in a supergroup (4x100 elements total) leading to 25 full 16-wide dispatches for each supergroup and no wasted lanes.

Since with this we are packing multiple workgroups in each SIMD dispatch we no longer have a uniform workgroup id, so we need to inform the divergence analysis pass about this.

This series implements this optimization for both v3d and v3dv.

Edited by Iago Toral

Merge request reports