radv: Allow pushing dynamic buffer descriptors through user SGPRs.
Pretty similar to inline push constants.
As far as benchmark numbers go:
Vega:
- qrenderdoc timing goes from 27.5k us to 25.3k on a Bayonetta trace. (But assuming that this also improves the game is dangerous)
- The actual game runes at 60 fps fixed and the usage is too noisy
- A renderdoccmd replay run of 50 frame improved from 1.193G "GUI_ACTIVE" cycles to 1.189 "GUI_ACTIVE" cycles consistently. Not sure how much of that is setup and stuff that is a totally different pattern.
- SotTR was benchmarked, but saw no diff.
Rave:
- Both game and renderdoc were very noisy, and there was no statistically different result.
On my Vega there is the alternative pattern of moving the cmdbuffer upload buffer to VRAM.
- In isolation this gives similar savings in the renderdoc trace.
- In combination it reduces the time to 24.6k us, likely due to still improving the situation for vertex buffers.
Overall no great source of numbers, but please review while I'm trying to gather more.
Edited by Daniel Schürmann