gallium/u_threaded: buffer subdata merging (v2)
In a scenario where a sequence of calls happens like:
- subdata(buffer_a, offset=0, size=64)
- subdata(buffer_a, offset=64, size=64)
- subdata(buffer_a, offset=128, size=64)
- subdata(buffer_a, offset=192, size=64)
and the buffer can't be directly mapped (e.g., because it has bindings), the subdata calls will now be merged together into one larger subdata call.
This is a replacement for !17597 (closed).
This is still lacking the ability to do invalidations + unsynchronized uploads in case a merged subdata call overwrites an entire buffer. This turned out to be much harder than I thought. However, it achieves sizeable performance gains even without this particular optimization.
Test: KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps on radeonsi
Before:
real 0m1,923s
user 0m1,017s
sys 0m0,051s
After:
real 0m0,686s
user 0m0,502s
sys 0m0,071s