Skip to content

nir, r300, intel-vec4: nir_move_vec_src_uses_to_dest improvements

What does this MR do and why?

I want to enable nir_move_vec_src_uses_to_dest for r300 driver, however as the pass currently is, the results are neutral at best, see stats in the first commit.

There are two reasons

  1. if the vector is only used in a store_output than reusing it will prevent us from storing the result in the output directly, so skip if the vector has only one reader which is store_output. This is done in the second commit and is enabled for everyone (but only intel vec4 shows any change).
  2. r300 hardware can load simple constants (0,1,-1,0.5,-0.5) for free using a constant swizzle (and r500 fs can inline also other constants quite well). So if we reuse the vectors instead of using the constants directly we lose some optimization options, so add an option to skip for sources that are constant. This is also enabled for intel-vec4 backend where it helps few Dolphin ubershaders.

Shader-db for the whole series below, from all the current users I shader-db tested lima, etnaviv, panfrost and crocus, I do not know how to test llvmpipe with shader-db and I was not able to make freedreno work: LD_PRELOAD=/path/to/libfreedreno_noop_drm_shim.so MESA_LOADER_DRIVER_OVERRIDE=freedreno ./run claims it can't find freedreno_dri. Also for the other drivers, I always only tested the default drm-shim GPU.

Additionally, my shader-db is mostly just the stock one with few old games on top, so if whoever maintains the old vec4 intel GPUs could give this test with some more extensive shader-db, it would be much appreciated.

Tagging all users of nir_move_vec_src_uses_to_dest, even the ones where I did not measure any changes.

lima, etnaviv and panfrost show no change (the new option is not enabled there), so only the second patch is relevant.

crocus HSW:
total instructions in shared programs: 1576762 -> 1576589 (-0.01%)
instructions in affected programs: 38720 -> 38547 (-0.45%)
helped: 40
HURT: 1
total cycles in shared programs: 111025898 -> 110944796 (-0.07%)
cycles in affected programs: 5647830 -> 5566728 (-1.44%)
helped: 44
HURT: 6
total spills in shared programs: 447 -> 432 (-3.36%)
spills in affected programs: 186 -> 171 (-8.06%)
helped: 12
HURT: 0
total fills in shared programs: 792 -> 774 (-2.27%)
fills in affected programs: 291 -> 273 (-6.19%)
helped: 12
HURT: 0
RV530:
total instructions in shared programs: 96949 -> 96304 (-0.67%)
instructions in affected programs: 33328 -> 32683 (-1.94%)
helped: 240
HURT: 26
total temps in shared programs: 12936 -> 12952 (0.12%)
temps in affected programs: 1596 -> 1612 (1.00%)
helped: 65
HURT: 95
total cycles in shared programs: 148224 -> 147314 (-0.61%)
cycles in affected programs: 46308 -> 45398 (-1.97%)
helped: 227
HURT: 33
RV370:
total instructions in shared programs: 63814 -> 63593 (-0.35%)
instructions in affected programs: 12875 -> 12654 (-1.72%)
helped: 139
HURT: 22
total temps in shared programs: 9978 -> 9993 (0.15%)
temps in affected programs: 702 -> 717 (2.14%)
helped: 37
HURT: 59
total cycles in shared programs: 101165 -> 100950 (-0.21%)
cycles in affected programs: 14171 -> 13956 (-1.52%)
helped: 133
HURT: 25

Merge request reports