microsoft/clc: Optimizations and improvements
This MR contains:
- One nir patch - I don't see any reason why you couldn't build a parallel deref chain that involves a pointer-as-array, so handle that in the follower helper.
- A couple patches to the DXIL backend.
- One bugfix for cases where static indexing handles are dynamically emitted; we shouldn't store the result.
- Minor cleanups to the CLOn12 compiler frontend.
- One optimization to the CL compiler: When a deref chain from kernel arg to memory access is simple and unbroken, we should just use a hardcoded buffer index in the shader, instead of having to look up the buffer index.
- A CL compiler bugfix: In the case of mixing images with printf or dynamically-accessed global consts (pointer value is used in a complex way), we would've emitted the same UAV ID assigned to an image and one of these other buffers, since the nir-to-DXIL backend groups all buffers together into a single contiguous range, but the CL frontend bit put the images after the globals, but before the other buffers.