Skip to content

nir: intel: Optimize generation of ffma and flrp

Ian Romanick requested to merge idr/mesa:review/optimize-flrp-and-ffma into master

This series has three parts. The first nine patches are some general bug fixes and clean ups.

  • nir: Add unit tests for nir_opt_comparison_pre
  • intel/vec4: Reswizzle VF immediates too
  • nir: Use nir_src_bit_size instead of alu1->dest.dest.ssa.bit_size
  • nir: Pass fully qualified type to nir_const_value_negative_equal
  • nir: Port some const_value_negative_equal tests to alu_src_negative_equal
  • nir: nir_const_value_negative_equal compares one value at a time
  • nir: Handle swizzle in nir_alu_srcs_negative_equal
  • nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations
  • intel/vec4: Delete vec4_visitor::emit_lrp

The next five work to prevent some regressions (and actually make some improvements) in the vec4 backend caused by the later patches.

  • intel/vec4: Refactor operand fixing for ffma and flrp
  • intel/vec4: Try to emit a single load for multiple 3-src instruction operands
  • intel/vec4: Try to emit a VF source in try_immediate_source
  • intel/vec4: Try to emit immediate sources for MOV
  • nir: intel/vec4: Add flag to disable some algebraic optimizations

Finally, the last four patches rearrange some common patterns to be more friendly to ffma and flrp generation.

  • nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)
  • nir/algebraic: Reassociate fadd into fmul in DPH-like pattern
  • nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form
  • nir/algebraic: Recognize open-coded flrp(a, b, a)

For Ice Lake, Skylake, and Haswell, the results across the entire series are shown below. Since Ice Lake lacks a LRP instruction, it gets pretty much all of its improvement from the DPH patch.

Ice Lake
total instructions in shared programs: 17173401 -> 16933147 (-1.40%)
instructions in affected programs: 7951135 -> 7710881 (-3.02%)
helped: 35636
HURT: 92
helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6
helped stats (rel) min: 0.10% max: 53.04% x̄: 5.28% x̃: 3.45%
HURT stats (abs)   min: 1 max: 41 x̄: 2.53 x̃: 2
HURT stats (rel)   min: 0.32% max: 8.33% x̄: 1.51% x̃: 0.98%
95% mean confidence interval for instructions value: -6.80 -6.65
95% mean confidence interval for instructions %-change: -5.31% -5.21%
Instructions are helped.

total cycles in shared programs: 360898020 -> 359533522 (-0.38%)
cycles in affected programs: 189588828 -> 188224330 (-0.72%)
helped: 27301
HURT: 6708
helped stats (abs) min: 1 max: 21997 x̄: 62.66 x̃: 16
helped stats (rel) min: <.01% max: 70.69% x̄: 4.06% x̃: 2.37%
HURT stats (abs)   min: 1 max: 3155 x̄: 51.63 x̃: 14
HURT stats (rel)   min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27%
95% mean confidence interval for cycles value: -45.12 -35.12
95% mean confidence interval for cycles %-change: -2.78% -2.67%
Cycles are helped.

total spills in shared programs: 8943 -> 8829 (-1.27%)
spills in affected programs: 625 -> 511 (-18.24%)
helped: 6
HURT: 3

total fills in shared programs: 21815 -> 21719 (-0.44%)
fills in affected programs: 1653 -> 1557 (-5.81%)
helped: 7
HURT: 10

LOST:   11
GAINED: 3


Skylake
total instructions in shared programs: 15280014 -> 15021428 (-1.69%)
instructions in affected programs: 9624984 -> 9366398 (-2.69%)
helped: 50145
HURT: 56
helped stats (abs) min: 1 max: 260 x̄: 5.16 x̃: 4
helped stats (rel) min: 0.02% max: 36.36% x̄: 4.24% x̃: 2.48%
HURT stats (abs)   min: 1 max: 41 x̄: 2.68 x̃: 1
HURT stats (rel)   min: 0.11% max: 6.19% x̄: 1.11% x̃: 0.81%
95% mean confidence interval for instructions value: -5.21 -5.10
95% mean confidence interval for instructions %-change: -4.28% -4.20%
Instructions are helped.

total cycles in shared programs: 355609606 -> 354057295 (-0.44%)
cycles in affected programs: 212581798 -> 211029487 (-0.73%)
helped: 39771
HURT: 10134
helped stats (abs) min: 1 max: 21997 x̄: 51.82 x̃: 14
helped stats (rel) min: <.01% max: 42.05% x̄: 3.51% x̃: 1.82%
HURT stats (abs)   min: 1 max: 4814 x̄: 50.18 x̃: 14
HURT stats (rel)   min: <.01% max: 125.43% x̄: 3.15% x̃: 1.31%
95% mean confidence interval for cycles value: -34.54 -27.67
95% mean confidence interval for cycles %-change: -2.21% -2.11%
Cycles are helped.

total spills in shared programs: 8843 -> 8818 (-0.28%)
spills in affected programs: 803 -> 778 (-3.11%)
helped: 7
HURT: 2

total fills in shared programs: 21738 -> 21744 (0.03%)
fills in affected programs: 1720 -> 1726 (0.35%)
helped: 6
HURT: 12

LOST:   12
GAINED: 34


Haswell
total instructions in shared programs: 13477221 -> 13378002 (-0.74%)
instructions in affected programs: 6893845 -> 6794626 (-1.44%)
helped: 32239
HURT: 236
helped stats (abs) min: 1 max: 409 x̄: 3.13 x̃: 1
helped stats (rel) min: 0.03% max: 36.36% x̄: 1.80% x̃: 1.25%
HURT stats (abs)   min: 1 max: 750 x̄: 7.65 x̃: 1
HURT stats (rel)   min: 0.11% max: 125.30% x̄: 2.90% x̃: 1.41%
95% mean confidence interval for instructions value: -3.16 -2.95
95% mean confidence interval for instructions %-change: -1.79% -1.75%
Instructions are helped.

total cycles in shared programs: 376303186 -> 375558254 (-0.20%)
cycles in affected programs: 202516126 -> 201771194 (-0.37%)
helped: 20466
HURT: 13398
helped stats (abs) min: 1 max: 20184 x̄: 68.61 x̃: 14
helped stats (rel) min: <.01% max: 53.10% x̄: 2.64% x̃: 1.28%
HURT stats (abs)   min: 1 max: 15441 x̄: 49.20 x̃: 10
HURT stats (rel)   min: <.01% max: 436.56% x̄: 3.29% x̃: 0.91%
95% mean confidence interval for cycles value: -25.45 -18.55
95% mean confidence interval for cycles %-change: -0.36% -0.22%
Cycles are helped.

total spills in shared programs: 23166 -> 23192 (0.11%)
spills in affected programs: 1699 -> 1725 (1.53%)
helped: 16
HURT: 11

total fills in shared programs: 34601 -> 34696 (0.27%)
fills in affected programs: 1549 -> 1644 (6.13%)
helped: 26
HURT: 11

LOST:   24
GAINED: 36

There's a big pile of changes that didn't make the cut in another branch. Most of the shader-db results in those commit messages are not current. WIP: nir/algebraic: Rearrange (a±1)*b to have multiply-add form would have made the cut, but it make a couple of shaders fall of a spill/fill cliff... the the tune of +500% (or a few HUNDRED) spills in a single shader. Ouch.

Edited by Ian Romanick

Merge request reports