aco: Optimize spilling
Doubles the speed of ACO's spiller, i.e. cutting time spent spilling by 40-50% depending on the shader. There is more potential to unlock here using C++17's std::monotonic_buffer_resource
, which I'll submit in a separate MR.
Summary of the changes made:
- Massively reduces overall number of allocations
-
std::vector<CONTAINER>
-like objects now are never shrunk so that the contained containers don't need to re-allocate their object storage all the time - Two scratch containers have been moved to
spill_ctx
so they don't need to re-allocate object storage for each block - Copying containers is avoided unless necessary
-
- Replaces
std::map
with more efficient containers-
std::unordered_map
for faster lookup -
std::vector
for efficient copying and faster iteration when slow element addition/removal is acceptable
-
- Avoids redundant map lookup by combining element checking and access into a single
map::find
call - Applies 71aab960 to the spiller
Valgrind benchmark for one example shader (previously spending 80% of the total aco_compile_shader runtime in the spiller, now only 62%):
Commit | aco::spill runtime (cycles) | Difference to previous | Relative change (normalized) | Cumulative change |
---|---|---|---|---|
(baseline) | 503861690 | |||
aco/spill: Avoid unneeded copies when iterating over maps | 503057792 | 0,803,898 | 0.2% | 0.2% |
aco: Use std::vector for the underlying container of std::stack | 502243562 | 0,814,230 | 0.2% | 0.3% |
aco/spill: Remove unused container | 502209832 | 0,033,730 | 0.0% | 0.3% |
aco/spill: Replace map[] with map::insert | 500808314 | 1,401,518 | 0.3% | 0.6% |
aco/spill: Avoid copying next_use maps more often than needed | 486598328 | 14,209,986 | 2.8% | 3.4% |
aco/spill: Persist memory allocations of local next use maps | 434403228 | 52,195,100 | 10.4% | 13.8% |
aco/spill: Avoid destroying local next use maps over-eagerly | 408126850 | 26,276,378 | 5.2% | 19.0% |
aco/spill: Replace vector with vector for local_next_use | 354467946 | 53,658,904 | 10.6% | 29.6% |
aco/spill: Prefer unordered_map over map for next use distances | 327301820 | 27,166,126 | 5.4% | 35.0% |
aco/spill: Avoid copying current_spills when not needed | 322136920 | 5,164,900 | 1.0% | 36.1% |
aco/spill: Reduce redundant std::map lookups | 321065650 | 1,071,270 | 0.2% | 36.3% |
aco/spill: Replace an std::map to booleans with std::set | 320742138 | 0,323,512 | 0.1% | 36.3% |
aco/spill: Store remat list in an std::unordered_map instead of std::map | 318450040 | 2,292,098 | 0.5% | 36.8% |
aco/spill: Change worklist to a single integer | 313510223 | 4,939,817 | 1.0% | 37.8% |
aco/spill: Reduce allocations in next_uses_per_block | 276328251 | 37,181,972 | 7.4% | 45.2% |
aco/spill: Clarify use of long-lived references by adding const | 276317478 | 0,010,773 | 0.0% | 45.2% |
aco/spill: Use unordered_map for spills_exit | 269587489 | 6,729,989 | 1.3% | 46.5% |
aco/spill: Use std::unordered_map for spills_entry | 251309451 | 18,278,038 | 3.6% | 50.1% |
Edited by Tony Wasserka