Add a standard library for driver OpenCL, including `assert` on device
Now that we have common code for precompiling driver CL C to hw binaries, the fundamentals are there for implementing driver code on CL. However, that infrastructure is still pretty barebones. This MR adds lots of common code providing a standard library for driver CL C, including lots of standard C constructs available in host C but not out-of-the-box in application CL:
- abort
- static_assert
- assert
- assorted routines ported from util/macros.h, math.h, etc
The combination of printf, abort, and assert facilitate debugging driver CL. If correctly integrated (as done here for Honeykrisp), these all work anywhere in the driver CL library - both precompiled shaders and library functions injected into application shaders.
Here's an example of a precompiled kernel:
KERNEL(1)
libagx_increment_cs_invocations(global uint *grid, global uint32_t *statistic,
uint32_t local_size_threads)
{
printf("local size = %u\n", local_size_threads);
assert(local_size_threads >= 1 && local_size_threads <= 32);
*statistic += local_size_threads * grid[0] * grid[1] * grid[2];
}
This produces output:
Test case 'dEQP-VK.query_pool.statistics_query.compute_shader_invocations.32bits_primary'..
local size = 8
local size = 1
local size = 63
Shader assertion fail at src/asahi/libagx/query.cl:107
Expected local_size_threads >= 1 && local_size_threads <= 32
DeviceLost (vk.waitForFences(device, 1u, &fence, VK_TRUE, timeoutNanos): VK_ERROR_DEVICE_LOST at vkCmdUtil.cpp:292)
..and as a refresher from last time, this is dispatched on the host as
libagx_increment_cs_invocations(cs, agx_1d(1), grid, stat,
agx_workgroup_threads(local_size));
with all the glue code, data layouts, etc all autogenerated.
The underlying abort mechanism is inspired by https://github.com/KhronosGroup/OpenCL-Docs/pull/808 and could maybe be used by Rusticl to implement that EXT if it's ever merged. Aborts are implemented as an augmentation to the existing OpenCL printf support, using an extra "has aborted?" flag in the print buffer. The idea is that the abort message is passed with the existing printf infrastructure, making a full 16KiB abort buffer unnecessary. (There are other ways we could implement abort, but I believe this is the right way for driver CL, at least for now. DM me for details.)
Lots of common code is added to minimize the code needed per-driver. The Honeykrisp patch is about 80 lines of code for both printf and abort support.
The common code also provides lots of #define's allowing you to write headers that will build on both the host/device with consistent layouts. For example:
struct foo {
GLOBAL(float) values;
uint32_t count;
};
This will expand on the host to
struct foo {
uint64_t values;
uint32_t count;
}
and expand on the device to
struct foo {
global float *values;
uint count;
}
Depends on and contains !32564 (merged) and !32513 (closed)