rusticl: implement SVM and buffer_device_address
Depends on !32581 (merged)
Upstream PR adding the buffer_device_address extension: https://github.com/KhronosGroup/OpenCL-Docs/pull/1159
This is a combined MR adding two extensions and laying out the ground work for a third one.
cl_ext_buffer_device_address
This extension is basically Shared Virtual Memory (SVM) lite. It's intended to be implemented on drivers not capable of supporting SVM (e.g. Vulkan based CL implements, zink
in our case). It allows a client to ask for the GPU address of a buffer and use it directly inside kernels. It also guarantees that addresses remain the same between kernel invocations, which the OpenCL core spec does not guarantee, unless you use SVM allocations.
This basically just require a frontend to tell the driver, that a virtual address of a pipe_resource
never changes and a new interface to ask for the address.
-
PIPE_RESOURCE_FLAG_FIXED_ADDRESS
to indicate on allocation that the resources address must not change -
pipe_screen::resource_get_address
to fetch the address of a resource -
pipe_grid::num_globals
andpipe_grid::globals
to pass in a list of buffers, which might be accessed indirectly in a kernel invocation. I choose not to useset_global_bindings
for it, because that interface requires the driver to write the address into some buffer and that would be pointless here, also no binding actually needs to happen. I wanted to move to a new interface for global memory anyway, so maybe that's what we can also rely on moving forward? Don't know...
Shared Virtual Memory
This optional OpenCL 2.0 feature basically allows one to allocate memory across multiple devices having the same virtual memory address on the host and on all devices. The runtime is responsible for implicitly migrating content based on usage.
Because making driver agree on a common address for a new pipe_resource
allocation would be a looot of pain, this MR adds a simple VM management interface frontends can use to allocate a VM range from drivers and assign addresses to pipe_resource
s directly. With those new interfaces added, a runtime can query valid VM ranges to allocate, choose a common sub-range across devices and allocate it. Once that's successfully executed, a frontend can use e.g. util_vma_heap
to manage VM addresses itself and guarantee the same address across a set of devices, even across vendors. No other OpenCL implementation supports this across vendors as of today!
The annoying part of SVM is, that pointers are the handle used. They can be accessed on the host between synchronization points, passed directly to GPU memory via data uploads, and accessed inside kernels directly, while the runtime is responsible for guaranteeing the content is up to date.
-
pipe_screen::alloc_vm
/pipe_screen::free_vm
allocates and frees VM ranges -
pipe_screen::resource_assign_vma
to assign an address to a resource, 0 to remove it. -
PIPE_RESOURCE_FLAG_FRONTEND_VM
to create a such a managedpipe_resource
. This also implicitly tells drivers to not sub-allocate and not to automatically allocate a virtual address. -
pipe_caps::min_vma
andpipe_caps::max_vma
to tell the frontends in which area they can freely allocate a VM range. This can also be used to exclude ranges where addresses require to be canonicalized. -
pipe_grid::num_globals
andpipe_grid::globals
same as above, it's expected that SVM resources are passed through this instead ofset_global_bindings
.
cl_intel_unified_shared_memory
This extension is required by a few SyCL implementations, so I'll probably implement it sooner or later, but it's not part of this Merge Request. It's a lot like SVM as it uses pointers as its interface, but with explicit data placements and migration. The same interfaces added to gallium as used for SVM will be used for this as well.
Program Scope Global Varaibles (__opencl_c_program_scope_global_variables)
This is a feature not directly needing any of this, however, it's a global variable where on a spir-v level you can see spec constant op operating on the address of such a variable. Either we add code to spill spec constant operation chains into shader code, or a frontend can pick the address itself (or get the address of a resource) and simply use it as a constant making it all very very very simple to compile as we can just continue to treat those as operation on constant values.