Skip to content

aco: Optimize spilling

Tony Wasserka requested to merge neobrain/mesa:opt_spiller into main

Doubles the speed of ACO's spiller, i.e. cutting time spent spilling by 40-50% depending on the shader. There is more potential to unlock here using C++17's std::monotonic_buffer_resource, which I'll submit in a separate MR.

Summary of the changes made:

  • Massively reduces overall number of allocations
    • std::vector<CONTAINER>-like objects now are never shrunk so that the contained containers don't need to re-allocate their object storage all the time
    • Two scratch containers have been moved to spill_ctx so they don't need to re-allocate object storage for each block
    • Copying containers is avoided unless necessary
  • Replaces std::map with more efficient containers
    • std::unordered_map for faster lookup
    • std::vector for efficient copying and faster iteration when slow element addition/removal is acceptable
  • Avoids redundant map lookup by combining element checking and access into a single map::find call
  • Applies 71aab960 to the spiller

Valgrind benchmark for one example shader (previously spending 80% of the total aco_compile_shader runtime in the spiller, now only 62%):

Commit aco::spill runtime (cycles) Difference to previous Relative change (normalized) Cumulative change
(baseline) 503861690
aco/spill: Avoid unneeded copies when iterating over maps 503057792 0,803,898 0.2% 0.2%
aco: Use std::vector for the underlying container of std::stack 502243562 0,814,230 0.2% 0.3%
aco/spill: Remove unused container 502209832 0,033,730 0.0% 0.3%
aco/spill: Replace map[] with map::insert 500808314 1,401,518 0.3% 0.6%
aco/spill: Avoid copying next_use maps more often than needed 486598328 14,209,986 2.8% 3.4%
aco/spill: Persist memory allocations of local next use maps 434403228 52,195,100 10.4% 13.8%
aco/spill: Avoid destroying local next use maps over-eagerly 408126850 26,276,378 5.2% 19.0%
aco/spill: Replace vector with vector for local_next_use 354467946 53,658,904 10.6% 29.6%
aco/spill: Prefer unordered_map over map for next use distances 327301820 27,166,126 5.4% 35.0%
aco/spill: Avoid copying current_spills when not needed 322136920 5,164,900 1.0% 36.1%
aco/spill: Reduce redundant std::map lookups 321065650 1,071,270 0.2% 36.3%
aco/spill: Replace an std::map to booleans with std::set 320742138 0,323,512 0.1% 36.3%
aco/spill: Store remat list in an std::unordered_map instead of std::map 318450040 2,292,098 0.5% 36.8%
aco/spill: Change worklist to a single integer 313510223 4,939,817 1.0% 37.8%
aco/spill: Reduce allocations in next_uses_per_block 276328251 37,181,972 7.4% 45.2%
aco/spill: Clarify use of long-lived references by adding const 276317478 0,010,773 0.0% 45.2%
aco/spill: Use unordered_map for spills_exit 269587489 6,729,989 1.3% 46.5%
aco/spill: Use std::unordered_map for spills_entry 251309451 18,278,038 3.6% 50.1%
Edited by Tony Wasserka

Merge request reports