venus: optimize device memory alloc and suballoc
We can make vkAllocateMemory
asynchronous. For mappable ones, the vm roundtrip between mem alloc and bo mapping can be optimized with what's done in !21716 (merged).
In addition, we should keep the existing device memory pool for better layering performance, but must optimize to reduce memory waste as well as suballocating more. When satisfying to suballocate, the real alloc can be deferred until bound to buffer or image to get the accurate alignment requirement. If bound happens after mapping setup, default alignment gets used, which should also be optimized per implementation specific instead of using the superset from anv implicit ccs.