anv: chain batch buffers within a given VkSubmitInfo
VkSubmitInfo contains an array of VkCommandBuffer to execute. When those are not created with VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, they can be modified in vkQueueSubmit to chain them all together so that we only do a single execbuffer ioctl for all the batches.
This has 2 advantages :
-
Reduce the size of the generated aub files
-
Speed up vkQueueSubmit() (by decreasing the number of ioctl required)
Using a crucible micro benchmark on KBL GT2:
Before:
bench.queue-submit.q0: Called vkQueueSubmit with 256 buffers 4096 times, took 1702689us (415us each)
bench.queue-submit-many-cmd-buffers.q0: Called vkQueueSubmit with 256 command buffers 4096 times, took 12679754us (3095us each)
After:
bench.queue-submit.q0: Called vkQueueSubmit with 256 buffers 4096 times, took 1741659us (425us each)
bench.queue-submit-many-cmd-buffers.q0: Called vkQueueSubmit with 256 command buffers 4096 times, took 2259032us (551us each)
Aztech Ruins:
On KBL GT2 +0.1fps
On SKL GT4 +0.6fps
Signed-off-by: Lionel Landwerlin lionel.g.landwerlin@intel.com