gallium,winsys/amdgpu: refactor pb_buffer/cache/slab, radically optimize winsys/amdgpu

Marek Olšák requested to merge mareko/mesa:winsys-bos-rewrite2 into main

Summary of the common code changes:

  • pb_buffer_lean is added, which is just pb_buffer without vtbl. pb_buffer becomes pb_buffer_lean + vtbl.
  • pb_cache_entry and pb_slab_entry are refactored to decrease their size, touching a bunch of drivers.

Summary of amdgpu changes:

  • Complete rewrite of BO fence tracking. It introduces a new queue fence system that decreases the CS thread overhead by 46%, massively decreases the CPU cache footprint for BO fences and their processing, and the best seen FPS improvement in one CPU-bound benchmark is 12%.
  • The slab allocator with 3 levels is replaced by a slab allocator with only 1 level. While I can't explain why this improves performance so much, one CPU-bound benchmark gets 10-18% (random/noisy) higher FPS.
  • Lots of refactoring to allow some of the size decreases.

r300 an r600 also have a lot of changes to accommodate the winsys changes.

This depends on !26547 (merged) (whose commits are included here, separated by an empty commit)

Edited by Marek Olšák

