Skip to content

nir_to_tgsi: Using NOLTIS for ffma fusing

Emma Anholt requested to merge anholt/mesa:noltis-ffma into main

This MR introduces NOLTIS, which I've had kicking around various branches for years now:

Until now, we have been doing most of the instruction selection job in
nir_opt_algebraic, both lowering and greedy matching.  Doing greedy
matching has downsides when interior nodes are shared, which we've tried
to mitigate using the "is_used_once" flag.  However, that flag can leave
instruction selection opportunities on the floor when an interior node
could have been elided by multiple tiles covering it.

Choosing an optimal tiling is Hard, but NOLTIS can approximate it quickly.
For now, introduce a tool for it that can be used either by drivers or by
a future rework of algebraic.

and then uses it for ffma fusing in nir-to-tgsi:

    nir_to_tgsi: Enable the new nir_opt_fuse_ffma pass.
    
    All the consumers seem to like this, without even having the consumers
    hint about what they would like from ffma (like how i915 can only
    reference one uniform/imm in an instr).
    
    softpipe:
    total instructions in shared programs: 3557515 -> 3557023 (-0.01%)
    instructions in affected programs: 53370 -> 52878 (-0.92%)
    total temps in shared programs: 363339 -> 363621 (0.08%)
    temps in affected programs: 2243 -> 2525 (12.57%)
    total imm in shared programs: 130693 -> 130686 (<.01%)
    imm in affected programs: 21 -> 14 (-33.33%)
    
    i915g:
    total instructions in shared programs: 503670 -> 503961 (0.06%)
    instructions in affected programs: 7725 -> 8016 (3.77%)
    total temps in shared programs: 29934 -> 29914 (-0.07%)
    temps in affected programs: 223 -> 203 (-8.97%)
    total const in shared programs: 67413 -> 67408 (<.01%)
    const in affected programs: 39 -> 34 (-12.82%)
    LOST:   0
    GAINED: 52
    
    r300:
    total instructions in shared programs: 1414733 -> 1413685 (-0.07%)
    instructions in affected programs: 132485 -> 131437 (-0.79%)
    total vinst in shared programs: 497606 -> 496675 (-0.19%)
    vinst in affected programs: 95774 -> 94843 (-0.97%)
    total sinst in shared programs: 247838 -> 246314 (-0.61%)
    sinst in affected programs: 52176 -> 50652 (-2.92%)
    total presub in shared programs: 50518 -> 54351 (7.59%)
    presub in affected programs: 19579 -> 23412 (19.58%)
    total omod in shared programs: 883 -> 916 (3.74%)
    omod in affected programs: 0 -> 33
    total temps in shared programs: 210868 -> 210585 (-0.13%)
    temps in affected programs: 9778 -> 9495 (-2.89%)
    total consts in shared programs: 1177597 -> 1177547 (<.01%)
    consts in affected programs: 4340 -> 4290 (-1.15%)
    total lits in shared programs: 26897 -> 26865 (-0.12%)
    lits in affected programs: 231 -> 199 (-13.85%)

It's quite a bit of core infrastructure (~1k LOC), but I hope by putting it in place we can help drivers more easily cut through some of the "fix one thing, break another" of how we have been doing instruction selection in algebraic (nir_opt_fuse_ffma is just 150 lines).

Merge request reports