venus: optimize template based descriptor set update and push (!28686) · Merge requests · Mesa / mesa

Yiwei Zhang requested to merge zzyiwei/mesa:vn-set-templ-update-push-opt into main Apr 11, 2024

venus: optimize template based descriptor set update and push

Summary:

commit 1~2: tiny issue fixes
commit 3: simplify push descriptor tracking
commit 4: optimize template data calculation (also make ptr math legit)
commit 5: optimize descriptor image info fix
commit 6: use STACK_ARRAY for template based set update and push to get rid of locking
commit 7: clean up the prior template set update bits (split from commit 6 to ease review, can stash if preferred)

At the bare minimum, there's no regression from commit 6. The overhead for the new vn_descriptor_set_fill_update_with_template is slightly larger than the prior vn_update_descriptor_set_with_template_locked (if just revert commit 6, otherwise faster than the prior call before this MR), however, overall I consistently see a reduction in cpu overhead because of the non-trivial lock overhead by itself.

Attaching flamegraphs for vkoverhead test 56 descriptor_template_16combined_sampler (to hit the suboptimal path of STACK_ARRAY):

before:
after:

Above are collected with:

debug venus build with asserts disabled
release build anv and vkr (shortcut the real set update call so that cpu bound on driver side)

An easy way to compare is to check the % changes of an untouched major call in both graphs: vn_async_vkUpdateDescriptorSets. Before this MR, 92.56%. After this MR, 95.26%. The lock overhead would standout more if the engine only does tiny updates. For engines updating descriptor with template and recording cmds in multiple threads, we also hit lock contention making this worse.

Edited Apr 11, 2024 by Yiwei Zhang

Admin message

venus: optimize template based descriptor set update and push

Merge request reports