venus: performance with QEMU on Xen
Hi AMD friends,
Copied over the question from !1068 (merged).
We just saw https://static.sched.com/hosted_files/xen2023/41/VirtIO_Passthrough_GPU_on_Xen_Summit_2023.pdf, and the Venus numbers look really odd. Since you are all here, could you help share some insights for why Venus being so slow on your integrations? if possible could you help share a few perfetto traces both in guest VM and host OS?
The 1st thing to check is dri config:
- have you enabled venus implicit fencing in the VM guest?
- set
DRI_CONF_VENUS_IMPLICIT_FENCING
to true to activate
- set
- have you used guest compositor to do the fence wait instead of mesa vk xwayland backend?
- set
DRI_CONF_VK_XWAYLAND_WAIT_READY
to false to activate - this further requires the guest compositor to be able to wait for the dma-fence attached to the winsys dma-buf (
DRM_IOCTL_VIRTGPU_WAIT
)
- set
The next question from me is how mappable device memory is managed on your integration. Do you use VIRTGPU_BLOB_MEM_HOST3D
wtih a hypervisor able to do runtime page injections or via dedicated heap with VIRTGPU_BLOB_MEM_GUEST_VRAM
.
-yiwei
=====================================
EDITED: tl;dr latest data (#365 (comment 2001002)) shows the delta for Venus comes from not enabling venus_implicit_fencing, which previously ended up blocking wait until the wsi bo ready for the implicit submission.
=====================================