aux/tc: implement buffer subdata merging
in a scenario where a sequence of calls happens like:
- subdata(buffer_a, offset=0, size=64)
- subdata(buffer_a, offset=64, size=64)
- subdata(buffer_a, offset=128, size=64)
- subdata(buffer_a, offset=192, size=64)
and the buffer can't be directly mapped (e.g., because it has bindings), the subdata calls will now be merged together into a map -> 4x memcpy -> unmap sequence which should reduce transfer operations in drivers
significantly cuts runtime of tests which use lots of subdata calls like KHR-GL46.CommonBugs.CommonBug_SparseBuffersWithCopyOps
:
- before -
1.55s user 0.32s system 16% cpu 11.613 total
- after -
0.68s user 0.10s system 14% cpu 5.339 total