Skip to content

aco: use s_clause breaks

Georg Lehmann requested to merge DadSchoorse/mesa:aco-clause-breaks into main

s_clause has an option to break the clause into multiple smaller clauses after each n instructions, where n is between 2 and 15.

We sometimes have instructions with the same clause type next to each other, but want to split them into smaller clauses. For example, when descriptors mismatch, or when it's a mix of global and buffer VMEM/SMEM loads.

So for shader code like this (4 clauses with 2 instructions each):

s_clause 0x1                                                ; bfa10001
image_sample v[30:31], v[34:35], s[56:63], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f0800308 00ae1e22
image_sample  v[36:37], [v14, v11], s[56:63], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f080030a 00ae240e 0000000b
s_clause 0x1                                                ; bfa10001
image_sample  v38, [v14, v11], s[64:71], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f080010a 00b0260e 0000000b
image_sample v39, v[34:35], s[64:71], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f0800108 00b02722
s_clause 0x1                                                ; bfa10001
image_sample v40, v[34:35], s[72:79], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f0800108 00b22822
image_sample  v41, [v14, v11], s[72:79], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f080010a 00b2290e 0000000b
s_clause 0x1                                                ; bfa10001
image_sample  v[42:43], [v14, v11], s[80:87], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f080030a 00b42a0e 0000000b
image_sample v[34:35], v[34:35], s[80:87], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f0800308 00b42222

we can instead emit this (One clause with 8 instructions and a break after each pair):

s_clause 0x207                                              ; bfa10207
image_sample v[30:31], v[34:35], s[56:63], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f0800308 00ae1e22
image_sample  v[36:37], [v14, v11], s[56:63], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f080030a 00ae240e 0000000b
image_sample  v38, [v14, v11], s[64:71], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f080010a 00b0260e 0000000b
image_sample v39, v[34:35], s[64:71], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f0800108 00b02722
image_sample v40, v[34:35], s[72:79], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f0800108 00b22822
image_sample  v41, [v14, v11], s[72:79], s[20:23] dmask:0x1 dim:SQ_RSRC_IMG_2D ; f080010a 00b2290e 0000000b
image_sample  v[42:43], [v14, v11], s[80:87], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f080030a 00b42a0e 0000000b
image_sample v[34:35], v[34:35], s[80:87], s[20:23] dmask:0x3 dim:SQ_RSRC_IMG_2D ; f0800308 00b42222
Foz-DB Navi21:
Totals from 13446 (16.98% of 79206) affected shaders:
Instrs: 12030998 -> 12010454 (-0.17%)
CodeSize: 64397304 -> 64317308 (-0.12%); split: -0.12%, +0.00%
Latency: 91281709 -> 91273068 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 20795015 -> 20794752 (-0.00%); split: -0.00%, +0.00%
VClause: 252808 -> 252799 (-0.00%)
SClause: 323709 -> 323085 (-0.19%)

This probably requires some testing/confirmation that the behavior is as described in the ISA docs. LLVM doesn't use clause breaks at all.

Merge request reports

Loading