Skip to content

rusticl: implement SVM and buffer_device_address

Karol Herbst requested to merge karolherbst/mesa:rusticl/svm/coarse into main

Depends on !32581 (merged)

Upstream PR adding the buffer_device_address extension: https://github.com/KhronosGroup/OpenCL-Docs/pull/1159

This is a combined MR adding two extensions and laying out the ground work for a third one.

cl_ext_buffer_device_address

This extension is basically Shared Virtual Memory (SVM) lite. It's intended to be implemented on drivers not capable of supporting SVM (e.g. Vulkan based CL implements, zink in our case). It allows a client to ask for the GPU address of a buffer and use it directly inside kernels. It also guarantees that addresses remain the same between kernel invocations, which the OpenCL core spec does not guarantee, unless you use SVM allocations.

This basically just require a frontend to tell the driver, that a virtual address of a pipe_resource never changes and a new interface to ask for the address.

  • PIPE_RESOURCE_FLAG_FIXED_ADDRESS to indicate on allocation that the resources address must not change
  • pipe_screen::resource_get_address to fetch the address of a resource
  • pipe_grid::num_globals and pipe_grid::globals to pass in a list of buffers, which might be accessed indirectly in a kernel invocation. I choose not to use set_global_bindings for it, because that interface requires the driver to write the address into some buffer and that would be pointless here, also no binding actually needs to happen. I wanted to move to a new interface for global memory anyway, so maybe that's what we can also rely on moving forward? Don't know...

Shared Virtual Memory

This optional OpenCL 2.0 feature basically allows one to allocate memory across multiple devices having the same virtual memory address on the host and on all devices. The runtime is responsible for implicitly migrating content based on usage.

Because making driver agree on a common address for a new pipe_resource allocation would be a looot of pain, this MR adds a simple VM management interface frontends can use to allocate a VM range from drivers and assign addresses to pipe_resources directly. With those new interfaces added, a runtime can query valid VM ranges to allocate, choose a common sub-range across devices and allocate it. Once that's successfully executed, a frontend can use e.g. util_vma_heap to manage VM addresses itself and guarantee the same address across a set of devices, even across vendors. No other OpenCL implementation supports this across vendors as of today!

The annoying part of SVM is, that pointers are the handle used. They can be accessed on the host between synchronization points, passed directly to GPU memory via data uploads, and accessed inside kernels directly, while the runtime is responsible for guaranteeing the content is up to date.

  • pipe_screen::alloc_vm/pipe_screen::free_vm allocates and frees VM ranges
  • pipe_screen::resource_assign_vma to assign an address to a resource, 0 to remove it.
  • PIPE_RESOURCE_FLAG_FRONTEND_VM to create a such a managed pipe_resource. This also implicitly tells drivers to not sub-allocate and not to automatically allocate a virtual address.
  • pipe_caps::min_vma and pipe_caps::max_vma to tell the frontends in which area they can freely allocate a VM range. This can also be used to exclude ranges where addresses require to be canonicalized.
  • pipe_grid::num_globals and pipe_grid::globals same as above, it's expected that SVM resources are passed through this instead of set_global_bindings.

cl_intel_unified_shared_memory

This extension is required by a few SyCL implementations, so I'll probably implement it sooner or later, but it's not part of this Merge Request. It's a lot like SVM as it uses pointers as its interface, but with explicit data placements and migration. The same interfaces added to gallium as used for SVM will be used for this as well.

Program Scope Global Varaibles (__opencl_c_program_scope_global_variables)

This is a feature not directly needing any of this, however, it's a global variable where on a spir-v level you can see spec constant op operating on the address of such a variable. Either we add code to spill spec constant operation chains into shader code, or a frontend can pick the address itself (or get the address of a resource) and simply use it as a constant making it all very very very simple to compile as we can just continue to treat those as operation on constant values.

Edited by Karol Herbst

Merge request reports

Loading