Skip to content

iris: Suballocate BOs for greater memory efficiency

Kenneth Graunke requested to merge kwg/mesa:iris-slab-busy into main

Note that the first 3 patches in this series are Paulo's MR !12363 (merged) which would need to land first. Those should be reviewed there.

This patch series updates iris to suballocate BOs, using large buffers as a slab, and suballocating small allocations from there. This gives us a smaller allocation granularity of 256B (down from 4K on current GPUs, or the painfully large 64K on future GPUs). Even API-facing resources now can be packed together into the same GEM object. We also match the largest slab allocation size to the PTE fragment size on discrete GPUs, which should allow the kernel to use a more optimal page table layout, which could speed up memory access. Additionally, this should lead to fewer exec_object2 entries, which could reduce CPU overhead slightly.

I've verified that suballocation is happening, CI is passing, and several benchmarks and games are working. The small amount of benchmarks that I've run indicate that things are on par or slightly faster, though not substantially. The main motivation here is to avoid wasting VRAM on discrete cards where padding out tiny allocations to the new 64K allocation granularity becomes extremely wasteful.

This code is substantially inspired by radeonsi and the amdgpu winsys, which have already done this for a few years. We use the same Gallium pb_slab infrastructure they use (which zink recently adopted as well). For now, we avoid using pb_buffer and pb_cache, as we already have a working cache. We should likely clean up the cache in a future series, as most of the cache buckets are now useless.

Merge request reports