Skip to content

nir, radeonsi: improve compile times for radeonsi by optimizing less

Marek Olšák requested to merge mareko/mesa:nir-compile-time into master

The first 2 commits are changes in nir!

The compile times with the gallium noop driver are 33% lower. For radeonsi, the decrease is 17%.

That means the noop driver compiles only 2x slower with NIR than with TGSI. Previously, NIR compilation was 3x slower than TGSI.

This does lead to worse AMD code, but the final result is still much better than what TGSI produces.

Turning off scalarization has the biggest negative effect on AMD code size.

Diff for NIR:

48505 shaders in 30515 tests
Totals:
SGPRS: 2206584 -> 2213272 (0.30 %)
VGPRS: 1647892 -> 1649128 (0.08 %)
Spilled SGPRs: 6256 -> 6124 (-2.11 %)
Spilled VGPRs: 72 -> 96 (33.33 %)
Private memory VGPRs: 2176 -> 2176 (0.00 %)
Scratch size: 2240 -> 2252 (0.54 %) dwords per thread
Code Size: 49680804 -> 50701700 (2.05 %) bytes
LDS: 74 -> 74 (0.00 %) blocks
Max Waves: 371387 -> 371326 (-0.02 %)
Wait states: 0 -> 0 (0.00 %)

TGSI (as baseline) vs NIR:

48505 shaders in 30515 tests
Totals:
SGPRS: 2230480 -> 2213272 (-0.77 %)
VGPRS: 1644132 -> 1649128 (0.30 %)
Spilled SGPRs: 10730 -> 6124 (-42.93 %)
Spilled VGPRs: 139 -> 96 (-30.94 %)
Private memory VGPRs: 4688 -> 2176 (-53.58 %)
Scratch size: 5356 -> 2252 (-57.95 %) dwords per thread
Code Size: 51044200 -> 50701700 (-0.67 %) bytes
LDS: 74 -> 74 (0.00 %) blocks
Max Waves: 371585 -> 371326 (-0.07 %)

Merge request reports