Skip to content

Draft: Add amdgpu hsakmt native context support for enable OpenCL based on AMD ROCm stack

Background:

This idea comes from virtgpu-native-context for mesa graphics virtgpu-native-context: Add msm native context.

The hsakmt is like drm in amdgpu ROCm compute stack, trying to make guest use native libhsakmt driver to enable amdgpu ROCm compute stack.

Currently the OpenCL is in progress. The libhsakmt is totally different with libdrm, more modifications is needed and this draft MR is in very early stage as far as I can see.

Implementation details:

  • Add libhsakmt backend, amd rbtree for memory management, new blob flag and create function.
  • Libhsakmt needs userptr feature that use the user space memory directly, called SVA/SVM generally, and the guest system memory need access by host libhsakmt directly, this is the firest challenge in implementation. The implement of WSL GPADL (Guest Physical Address Descriptor Lists): WSL-GPADL is referenced, the guest user memory used in hsakmt native context is not moveable so that the backend driver and GPU hardware can access them with no data error. And we have a plan to bypass mmu notifior message into backend let guest user memory pin free.
  • The libhsakmt bo is address based not handle based different with libdrm. And the ROCm runtime submit the command use the libhsakmt bo address in guest directly. So we need mirror the guest address and host address this is the second challenge, the rbtree is used for manage the libhsakmt bo address keep all the bo address from libhsakmt is same with the guest address in reversed memory range.
  • The most difference between libhsakmt and libdrm is libhsakmt doesn't return file handle when open device. Libhsakmt ties with the process this is the third challenge. So different guest process share one real libhsakmt backend. And we are trying to modify the libshakmt let it support multi handle in one preocess. Or maybe trying to create a multi process backend stack is better.

Performance:

Got 97% (13000/13300) performance vs bare metal in Geekbench6 by using OpenCL API in Xen hypervisor. The OpenCL CTS basic test all passed currently.

image.png

image.png

Edited by Honglei Huang

Merge request reports