CL: Support async copy and group wait functions
This series adds support for the async copy/wait functions. Specifically, async_work_group_copy
, async_work_group_strided_copy
, and wait_group_events
. For whatever reason, these are core SPIR-V opcodes rather than OpenCL extension opcodes. Both strided and non-strided copies are collapsed to a single opcode (non-strided just has a stride of 1).
Things that are messy here:
- The event type is an OpenCL/SPIR-V built-in type. I added a corresponding NIR type.
- The stride parameter for strided copies applies to global memory only. That means that the implementations are not generically overloaded on address space like the rest of libclc, but there's two distinct overload sets based on whether you're copying to global memory or from global memory. That means that we need to correctly include the address space in the name mangling.
- The name mangler that we were using (from the LLVM-SPIRV-Translator) can't handle mangling the OpenCL
event_t
built-in type. From what I can tell, only clang, not LLVM, is aware of that type, and only within the AST. So this change instead uses a locally-implemented mangler. The set of functions we need from libclc is well-scoped, so the amount of mangling we have to do is limited enough that this is feasible. - Since these opcodes are core, plumbing them through to get to the name mangler and intrinsic-to-call behavior was a bit ugly, since they don't have OpenCL extension opcodes.
Edited by Jesse Natalie