venus: performance with QEMU on Xen

Hi AMD friends,

Copied over the question from !1068 (merged).

We just saw https://static.sched.com/hosted_files/xen2023/41/VirtIO_Passthrough_GPU_on_Xen_Summit_2023.pdf, and the Venus numbers look really odd. Since you are all here, could you help share some insights for why Venus being so slow on your integrations? if possible could you help share a few perfetto traces both in guest VM and host OS?

The 1st thing to check is dri config:

have you enabled venus implicit fencing in the VM guest?
- set DRI_CONF_VENUS_IMPLICIT_FENCING to true to activate
have you used guest compositor to do the fence wait instead of mesa vk xwayland backend?
- set DRI_CONF_VK_XWAYLAND_WAIT_READY to false to activate
- this further requires the guest compositor to be able to wait for the dma-fence attached to the winsys dma-buf (DRM_IOCTL_VIRTGPU_WAIT)

The next question from me is how mappable device memory is managed on your integration. Do you use VIRTGPU_BLOB_MEM_HOST3D wtih a hypervisor able to do runtime page injections or via dedicated heap with VIRTGPU_BLOB_MEM_GUEST_VRAM.

-yiwei

=====================================

EDITED: tl;dr latest data (#365 (comment 2001002)) shows the delta for Venus comes from not enabling venus_implicit_fencing, which previously ended up blocking wait until the wsi bo ready for the implicit submission.

=====================================

/cc @rui @HongleiHuang @Julia @yuq825 @boyzhang @flynnjiang

Edited Jul 14, 2023 by Yiwei Zhang

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information