Skip to content
  • Ian Romanick's avatar
    nir/phi_builder: Use per-value hash table to store [block] -> def mapping · 8161a87b
    Ian Romanick authored
    
    
    Replace the old array in each value with a hash table in each value.
    
    Changes in peak memory usage according to Valgrind massif:
    
    mean soft fp64 using uint64:   5,499,875,082 => 1,343,991,403
    gfxbench5 aztec ruins high 11:    63,619,971 =>    63,619,971
    deus ex mankind divided 148:      62,887,728 =>    62,887,728
    deus ex mankind divided 2890:     72,402,222 =>    72,399,750
    dirt showdown 676:                74,466,431 =>    69,464,023
    dolphin ubershaders 210:         109,630,376 =>    78,359,728
    
    Run-time change for a full run on shader-db on my Haswell desktop (with
    -march=native) is 1.22245% +/- 0.463879% (n=11).  This is about +2.9
    seconds on a 237 second run.  The first time I sent this version of this
    patch out, the run-time data was quite different.  I had misconfigured
    the script that ran the test, and none of the tests from higher GLSL
    versions were run.  These are generally more complex shaders, and they
    are more affected by this change.
    
    The previous version of this patch used a single hash table for the
    whole phi builder.  The mapping was from [value, block] -> def, so a
    separate allocation was needed for each [value, block] tuple.  There was
    quite a bit of per-allocation overhead (due to ralloc), so the patch was
    followed by a patch that added the use of the slab allocator.  The
    results of those two patches was not quite as good:
    
    mean soft fp64 using uint64:   5,499,875,082 => 1,343,991,403
    gfxbench5 aztec ruins high 11:    63,619,971 =>    63,619,971
    deus ex mankind divided 148:      62,887,728 =>    62,887,728
    deus ex mankind divided 2890:     72,402,222 =>    72,402,222 *
    dirt showdown 676:                74,466,431 =>    72,443,591 *
    dolphin ubershaders 210:         109,630,376 =>    81,034,320 *
    
    The * denote tests that are better now.  In the tests that are the same
    in both patches, the "after" peak memory usage was at a different
    location.  I did not check the local peaks.
    
    Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
    Suggested-by: default avatarJason Ekstrand <jason@jlekstrand.net>
    Reviewed-by: default avatarJason Ekstrand <jason@jlekstrand.net>
    8161a87b