intel/compiler: Reuse information between the various scheduling modes (!25841) · Merge requests · Mesa / mesa

Caio Oliveira requested to merge cmarcelo/mesa:intel-compiler-scheduler-reuse into main Oct 21, 2023

Makes the scheduling part of the backend compiler about ~30% faster by changing how we store pass information and reuse that for the multiple pre-RA scheduling modes. In overall fossil executions this will account for between ~~2% to 4.5%~~ 1.8% to 2.5% (updated results below). The ~30% number was checked using perf/flamegraph data for Cyberpunk 2077, ROTTR and Total War Warhammer 3 (still valid in the updated measurements).

Summary of changes:

Allocating memory in bulk
Iterating through arrays instead of linked lists when possible
Remove virtual functions while still keeping some code share between Vec4 and FS
Smaller changes to how/what data is stored
Allowing scheduler to be reused for pre-RA modes

Fossil run measurements in TGL (compared originally against f54e06e206db3278d38e6dabe781d083594e889c)

RISE OF THE TOMB RIDER (NATIVE)

    N           Min           Max        Median           Avg        Stddev
x  13         30.08         30.27         30.12     30.133077   0.047325929
+  13         28.69         29.56         28.72     28.785385    0.23358137
Difference at 95.0% confidence
        -1.34769 +/- 0.136431
        -4.47247% +/- 0.452761%
        (Student's t, pooled s = 0.168523)


ASSASSINS CREED ODISSEY (DXVK)

    N           Min           Max        Median           Avg        Stddev
x  13         14.66         14.72         14.69     14.686154   0.014455945
+  13         14.18         14.23          14.2     14.199231   0.011875422
Difference at 95.0% confidence
	-0.486923 +/- 0.0107096
	-3.31552% +/- 0.0729229%
	(Student's t, pooled s = 0.0132288)


BATMAN ARKHAM CITY (DXVK)

    N           Min           Max        Median           Avg        Stddev
x  13        529.45        530.68         529.7     529.75538    0.29912693
+  13        506.25        506.91        506.67     506.64154    0.18165196
Difference at 95.0% confidence
	-23.1138 +/- 0.200337
	-4.36312% +/- 0.0378168%
	(Student's t, pooled s = 0.247461)


CYBERPUNK 2077 (DXVK)

    N           Min           Max        Median           Avg        Stddev
x  13        118.75        119.15        118.92     118.93615    0.12593242
+  13        116.12        116.42         116.3     116.27769   0.091209817
Difference at 95.0% confidence
	-2.65846 +/- 0.0890123
	-2.2352% +/- 0.0748404%
	(Student's t, pooled s = 0.10995)


TOTAL WAR WARHAMMER 3 (DXVK)

    N           Min           Max        Median           Avg        Stddev
x  13        115.27        115.38        115.31     115.31385    0.03228479
+  13        111.34        111.47        111.42     111.41846   0.035787836
Difference at 95.0% confidence
	-3.89538 +/- 0.0275912
	-3.37807% +/- 0.023927%
	(Student's t, pooled s = 0.0340814)

Updated fossil run measurements in TGL, newer GCC/Fedora (compared against )

// Time in seconds
// Difference at 95.0% confidence

RISE OF THE TOMB RIDER (NATIVE)  N=13
	-0.702308 +/- 0.0185155
	-2.32363% +/- 0.0612597%

ASSASSINS CREED ODISSEY (DXVK)  N=13
	-0.294615 +/- 0.0129147
	-1.99095% +/- 0.0872754%

BATMAN ARKHAM CITY (DXVK)  N=7
	-13.5871 +/- 0.428562
	-2.545% +/- 0.0802737%

CYBERPUNK 2077 (DXVK)  N=13
	-2.74692 +/- 0.179814
	-2.22448% +/- 0.145615%

TOTAL WAR WARHAMMER 3 (DXVK)  N=13
	-2.08615 +/- 0.02064
	-1.80155% +/- 0.0178242%

I haven't measured vec4, but would expect some (but not the same since there are no pre-RA) improvement there.

Edited Nov 11, 2023 by Caio Oliveira

intel/compiler: Reuse information between the various scheduling modes

Merge request reports