Skip to content

DXIL spring cleaning

The goal of this series is to:

  • Remove the custom DXIL intrinsics from NIR. They're not really necessary and optimization passes don't know about them.
  • Make generated DXIL more idiomatic / like what DXC would produce.
  • Make generated DXIL more optimal.
  • Remove duplicated code.

To that end, the high-level changes on the DXIL side are:

  • Shared memory, function-private memory, and constants are now modeled as NIR variables with deref loads and stores. This matches how the DXIL will eventually look.
    • For source languages that can do pointer casting / unions (CL), these are lowered to explicit I/O and then un-lowered to array accesses, which effectively match how the old intrinsics used to work. Note that constants have a pass which attempts to avoid lowering them if they're never used in a complex way - a future change could do the same for shared/temp.
    • For Vulkan, these are no longer lowered to explicit I/O. Instead, they're lowered in the same way that DXC lowers them, from SoA to AoS via nir_split_struct_vars and then some custom array/vector flattening. For mesh/raytracing, this can be avoided, but for all current shader stages, DXC requires them to be flat like this. One upside here is that emulation of small types becomes cheaper, because there's no explicit layout info that says 8-bit types need to be packed tightly.
    • For GL, I'd like to try adding a pipe cap to avoid explicit I/O lowering.
  • UBOs are remapped from load_ubo_dxil to load_ubo_vec4 which has nearly identical semantics, but is actually well thought-out. The component constant value is used to remove some of the extractValue calls, and 16-bit and 64-bit overloads are now supported.
  • SSBOs lose their custom lowering and use nir_lower_mem_access_bit_sizes, which is extended to support the same masking stores that we did previously.

Minor changes to NIR are:

  • Add a notion of a null constant. This is used mainly in constant folding to allow load_deref of casts coming from a null constant variable to just return 0. This improves OpenCL codegen when a zero-initialized struct is involved. In that case, clang generates a memset, which llvm-spirv turns into a memcpy from a null-constant byte-array variable, which means that nir_opt_memcpy can end up inserting a cast which prevents the constant folding. After this change, I'm able to get the memcpy deleted entirely, and the result looks just like a lowered constant initializer of 0s. The test was clc_compiler_test.cpp's runtime_memcpy test.
  • Const initializer support is added to nir_split_struct_vars, so that nir_var_mem_constant variables can be split.
  • nir_lower_mem_access_bit_sizes now passes along the input bit size. If we don't have lowering that needs to be done, I'd really prefer not to just guess at a bit size and end up inserting pack/unpack logic that should've just been movs, but any time the returned bit size doesn't match, the op gets lowered.

Merge request reports