Skip to content

ir3: lower 64b registers

Job Noorman requested to merge jnoorman/mesa:ir3-lower-64b-regs into main

After all int64/double lowerings, there might still be 64b registers left which ir3 currently doesn't handle. This only happens in a small number of Piglit tests where those registers (or the variables they come from) did not get DCE'd.

This patch handles 64b registers in ir3 by adding a NIR pass that does the following:

  • @reg_decl -> split in two 32b ones
  • @store_reg -> unpack_64_2x32_split_x/y and two separate stores
  • @load_reg -> two separate loads and pack_64_2x32_split

For example, the relevant part of the fragment shader from spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-dvec3 gets lowered to the following NIR:

32    %1444 = @decl_reg () (num_components=3, num_array_elems=2, bit_size=64, divergent=1)
...
64x3     %9 = vec3 %5, %6, %8
              @store_reg (%9, %1444) (base=0, wrmask=xyz, legacy_fsat=0)
...
64x3  %1446 = @load_reg_indirect (%1444, %30) (base=0, legacy_fabs=0, legacy_fneg=0)

The ir3_nir_lower_64b_regs pass proposed by this patch lowers this to:

32    %1448 = @decl_reg () (num_components=3, num_array_elems=2, bit_size=32, divergent=1)
32    %1447 = @decl_reg () (num_components=3, num_array_elems=2, bit_size=32, divergent=1)
...
64x3     %9 = vec3 %5, %6, %8
32x3  %1449 = unpack_64_2x32_split_x %9
32x3  %1450 = unpack_64_2x32_split_y %9
              @store_reg (%1449, %1448) (base=0, wrmask=xyz, legacy_fsat=0)
              @store_reg (%1450, %1447) (base=0, wrmask=xyz, legacy_fsat=0)
...
32x3  %1453 = @load_reg_indirect (%1448, %30) (base=0, legacy_fabs=0, legacy_fneg=0)
32x3  %1454 = @load_reg_indirect (%1447, %30) (base=0, legacy_fabs=0, legacy_fneg=0)
64x3  %1455 = pack_64_2x32_split %1453, %1454

After this pass, the 64b vecs used for the original loads/stores are still present and are also not handled yet by ir3. This patch removes them by running nir_lower_alu_to_scalar and nir_copy_prop.

Fixes the following Piglit tests:

  • spec@arb_enhanced_layouts@execution@component-layout@vs-fs-array-dvec3
  • spec@arb_gpu_shader_fp64@uniform_buffers@fs-array-copy
  • spec@arb_gpu_shader_fp64@uniform_buffers@gs-array-copy
  • spec@arb_gpu_shader_fp64@uniform_buffers@vs-array-copy
  • spec@arb_tessellation_shader@execution@variable-indexing@vs-output-array-dvec4-index-wr-before-tcs

This patch has no impact on shader-db.

Edited by Job Noorman

Merge request reports