Skip to content

ir3: improve handling of WAR hazards

Job Noorman requested to merge jnoorman/mesa:ir3-war-improvements into main

This series improves the handling of WAR hazards in a few ways:

  • Fix bug where WAR hazards were added for const/imm regs;
  • Don't add (ss) for WAR hazards already resolved using (sy): although all WAR hazards can be resolved using (ss), if the reader is a (sy)-producer, and it has already been synced using (sy), its sources have definitely been read so an extra (ss) is not necessary.
  • Take WAR hazards into account in post-RA scheduling. Note that I have played around with also taking the (sy) syncs for WAR into account but the results were mixed and since it introduced a lot of complexity, I've left this out for now.

This shows some nice shader-db results, especially for (ss) and sstall:

shader-db results
total instructions in shared programs: 4105224 -> 4096025 (-0.22%)
instructions in affected programs: 1677124 -> 1667925 (-0.55%)
helped: 1637
HURT: 1842
helped stats (abs) min: 1 max: 1379 x̄: 11.42 x̃: 2
helped stats (rel) min: 0.02% max: 25.30% x̄: 1.62% x̃: 0.70%
HURT stats (abs)   min: 1 max: 55 x̄: 5.16 x̃: 4
HURT stats (rel)   min: 0.01% max: 11.97% x̄: 1.87% x̃: 1.40%
95% mean confidence interval for instructions value: -4.21 -1.08
95% mean confidence interval for instructions %-change: 0.13% 0.32%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total nops in shared programs: 888176 -> 878977 (-1.04%) nops in affected programs: 436092 -> 426893 (-2.11%) helped: 1637 HURT: 1842 helped stats (abs) min: 1 max: 1379 x̄: 11.42 x̃: 2 helped stats (rel) min: 0.04% max: 100.00% x̄: 12.91% x̃: 6.06% HURT stats (abs) min: 1 max: 55 x̄: 5.16 x̃: 4 HURT stats (rel) min: 0.00% max: 800.00% x̄: 20.26% x̃: 9.30% 95% mean confidence interval for nops value: -4.21 -1.08 95% mean confidence interval for nops %-change: 3.26% 6.05% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total non-nops in shared programs: 3217048 -> 3217048 (0.00%) non-nops in affected programs: 0 -> 0 helped: 0 HURT: 0

total mov in shared programs: 144229 -> 144229 (0.00%) mov in affected programs: 0 -> 0 helped: 0 HURT: 0

total cov in shared programs: 88704 -> 88704 (0.00%) cov in affected programs: 0 -> 0 helped: 0 HURT: 0

total dwords in shared programs: 8778888 -> 8776314 (-0.03%) dwords in affected programs: 1307308 -> 1304734 (-0.20%) helped: 437 HURT: 626 helped stats (abs) min: 2 max: 608 x̄: 17.46 x̃: 2 helped stats (rel) min: 0.03% max: 16.25% x̄: 1.28% x̃: 0.73% HURT stats (abs) min: 2 max: 32 x̄: 8.08 x̃: 2 HURT stats (rel) min: 0.04% max: 17.07% x̄: 1.47% x̃: 0.62% 95% mean confidence interval for dwords value: -4.64 -0.20 95% mean confidence interval for dwords %-change: 0.19% 0.49% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total last-baryf in shared programs: 130915 -> 131207 (0.22%) last-baryf in affected programs: 31866 -> 32158 (0.92%) helped: 65 HURT: 107 helped stats (abs) min: 1 max: 25 x̄: 5.71 x̃: 4 helped stats (rel) min: 0.46% max: 12.73% x̄: 3.45% x̃: 2.19% HURT stats (abs) min: 1 max: 26 x̄: 6.20 x̃: 5 HURT stats (rel) min: 0.39% max: 42.31% x̄: 3.93% x̃: 2.31% 95% mean confidence interval for last-baryf value: 0.47 2.93 95% mean confidence interval for last-baryf %-change: 0.26% 2.02% Last-baryf are HURT.

total last-helper in shared programs: 1131900 -> 1129140 (-0.24%) last-helper in affected programs: 622475 -> 619715 (-0.44%) helped: 735 HURT: 621 helped stats (abs) min: 1 max: 230 x̄: 18.05 x̃: 5 helped stats (rel) min: 0.04% max: 84.36% x̄: 7.24% x̃: 1.82% HURT stats (abs) min: 1 max: 307 x̄: 16.92 x̃: 6 HURT stats (rel) min: 0.03% max: 479.69% x̄: 13.23% x̃: 2.16% 95% mean confidence interval for last-helper value: -4.04 -0.04 95% mean confidence interval for last-helper %-change: 0.47% 3.80% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total half in shared programs: 0 -> 0 half in affected programs: 0 -> 0 helped: 0 HURT: 0

total full in shared programs: 254152 -> 254152 (0.00%) full in affected programs: 0 -> 0 helped: 0 HURT: 0

total constlen in shared programs: 628940 -> 628940 (0.00%) constlen in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat0 in shared programs: 986933 -> 977734 (-0.93%) cat0 in affected programs: 473889 -> 464690 (-1.94%) helped: 1637 HURT: 1842 helped stats (abs) min: 1 max: 1379 x̄: 11.42 x̃: 2 helped stats (rel) min: 0.04% max: 66.67% x̄: 10.76% x̃: 5.66% HURT stats (abs) min: 1 max: 55 x̄: 5.16 x̃: 4 HURT stats (rel) min: 0.03% max: 600.00% x̄: 21.57% x̃: 9.57% 95% mean confidence interval for cat0 value: -4.21 -1.08 95% mean confidence interval for cat0 %-change: 5.00% 7.72% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total cat1 in shared programs: 234377 -> 234377 (0.00%) cat1 in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat2 in shared programs: 1501600 -> 1501600 (0.00%) cat2 in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat3 in shared programs: 1196828 -> 1196828 (0.00%) cat3 in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat4 in shared programs: 84089 -> 84089 (0.00%) cat4 in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat5 in shared programs: 48109 -> 48109 (0.00%) cat5 in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat6 in shared programs: 50156 -> 50156 (0.00%) cat6 in affected programs: 0 -> 0 helped: 0 HURT: 0

total cat7 in shared programs: 3132 -> 3132 (0.00%) cat7 in affected programs: 0 -> 0 helped: 0 HURT: 0

total stp in shared programs: 2448 -> 2448 (0.00%) stp in affected programs: 0 -> 0 helped: 0 HURT: 0

total ldp in shared programs: 568 -> 568 (0.00%) ldp in affected programs: 0 -> 0 helped: 0 HURT: 0

total sstall in shared programs: 335784 -> 315300 (-6.10%) sstall in affected programs: 149340 -> 128856 (-13.72%) helped: 2261 HURT: 685 helped stats (abs) min: 1 max: 270 x̄: 10.36 x̃: 7 helped stats (rel) min: 0.75% max: 100.00% x̄: 29.64% x̃: 22.22% HURT stats (abs) min: 1 max: 23 x̄: 4.28 x̃: 3 HURT stats (rel) min: 0.00% max: 450.00% x̄: 34.25% x̃: 14.29% 95% mean confidence interval for sstall value: -7.47 -6.44 95% mean confidence interval for sstall %-change: -16.37% -13.20% Sstall are helped.

total (ss) in shared programs: 85394 -> 76205 (-10.76%) (ss) in affected programs: 45455 -> 36266 (-20.22%) helped: 4688 HURT: 225 helped stats (abs) min: 1 max: 90 x̄: 2.01 x̃: 1 helped stats (rel) min: 0.92% max: 100.00% x̄: 43.31% x̃: 28.57% HURT stats (abs) min: 1 max: 4 x̄: 1.13 x̃: 1 HURT stats (rel) min: 1.85% max: 50.00% x̄: 19.26% x̃: 16.67% 95% mean confidence interval for (ss) value: -1.99 -1.75 95% mean confidence interval for (ss) %-change: -41.46% -39.43% (ss) are helped.

total systall in shared programs: 650040 -> 652696 (0.41%) systall in affected programs: 111692 -> 114348 (2.38%) helped: 289 HURT: 668 helped stats (abs) min: 1 max: 39 x̄: 5.36 x̃: 3 helped stats (rel) min: 0.10% max: 100.00% x̄: 17.05% x̃: 8.15% HURT stats (abs) min: 1 max: 98 x̄: 6.29 x̃: 5 HURT stats (rel) min: 0.00% max: 2100.00% x̄: 25.98% x̃: 7.89% 95% mean confidence interval for systall value: 2.17 3.38 95% mean confidence interval for systall %-change: 7.52% 18.45% Systall are HURT.

total (sy) in shared programs: 35562 -> 35586 (0.07%) (sy) in affected programs: 500 -> 524 (4.80%) helped: 27 HURT: 51 helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1 helped stats (rel) min: 4.26% max: 50.00% x̄: 21.94% x̃: 20.00% HURT stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1 HURT stats (rel) min: 3.70% max: 50.00% x̄: 28.69% x̃: 25.00% 95% mean confidence interval for (sy) value: 0.07 0.54 95% mean confidence interval for (sy) %-change: 5.10% 17.23% (sy) are HURT.

total waves in shared programs: 630114 -> 630114 (0.00%) waves in affected programs: 0 -> 0 helped: 0 HURT: 0

total loops in shared programs: 1085 -> 1085 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0

LOST: 0 GAINED: 0

Total CPU time (seconds): 1194.68 -> 1181.02 (-1.14%)

I've also measure a modest 0.3% improvement in render pass time in some d3d11 traces on a750.

Merge request reports

Loading