CL: Integer ops fixes
This (long) series is the set of patches needed to pass OpenCL's test_integer_ops tests. Some notes:
First, core nir or vtn:
- Several places in core nir, vtn, and one in our compiler, needed to support vector sizes of 8 and 16.
- A few nir lowering passes needed improvements:
-
nir_lower_bit_size
for unary ops which don't have matching source/dest sizes. -
nir_lower_int64
forbit_count
ops (trivial split + add). This currently conflicts with the bit values used by Boris's int64 <-> float lowering, so we'll just need to make sure we resolve that. -
nir_lower_alu
formul_high
for non-32-bit. -
nir_opt_algebraic
foruadd_carry
on 64-bit values.
-
- Fixed or added support for a few SPIR-V opcodes.
- Fixed mangling for
SMad_sat
CL opcode to libclc.
For our backend:
- Our compiler didn't support i16 overloads for intrinsics, which if we're using native int16s, it should.
- A few places in
nir_to_dxil
didn't deal with unaryBits intrinsics having mismatching source/dest sizes. - Bit size lowering is moved to the optimization loop so we can do 8 -> 16 and then 16 -> 32 if necessary, since that's what comes out of the algebraic pass we're using. I considered trying to rework it to support 8 -> 32, but that would've been much more complex.
- We need to handle
imul24
, though the CL spec says it's undefined if inputs are out of range. Just mapping it to imul passes the CTS for that intrinsic, so this should be fine? - One place needed to be updated to handle phis with more than 4 components.
Edited by Jesse Natalie