1. 06 Jun, 2019 5 commits
    • Ian Romanick's avatar
      intel/fs: Improve discard_if code generation · 0ba9497e
      Ian Romanick authored
      Previously we would blindly emit an sequence like:
      
              mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
              ...
              cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
      (+f0.1) cmp.z.f0.1(16)  null<1>D        g7<8,8,1>D      0D
      
      The first move sets the flags based on the initial execution mask.
      Later discard sequences contain a predicated compare that can only
      remove more SIMD channels.  Often times the only user of the result from
      the first compare is the second compare.  Instead, generate a sequence
      like
      
              mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
              ...
              cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
      (+f0.1) cmp.ge.f0.1(8)  null<1>F        g5<8,8,1>F      0x41700000F  /* 15F */
      
      If the results stored in g7 and f0.0 are not used, the comparison will
      be eliminated.  This removes an instruction and potentially reduces
      register pressure.
      
      v2: Major re-write of the commit message (including fixing the assembly
      code).  Suggested by Matt.
      
      All Gen8+ platforms had similar results. (Ice Lake shown)
      total instructions in shared programs: 17224434 -> 17198659 (-0.15%)
      instructions in affected programs: 2908125 -> 2882350 (-0.89%)
      helped: 18891
      HURT: 5
      helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
      helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
      HURT stats (abs)   min: 9 max: 105 x̄: 51.40 x̃: 35
      HURT stats (rel)   min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
      95% mean confidence interval for instructions value: -1.39 -1.34
      95% mean confidence interval for instructions %-change: -1.79% -1.73%
      Instructions are helped.
      
      total cycles in shared programs: 361468458 -> 361170679 (-0.08%)
      cycles in affected programs: 38470116 -> 38172337 (-0.77%)
      helped: 16202
      HURT: 1456
      helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
      helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
      HURT stats (abs)   min: 1 max: 5982 x̄: 87.51 x̃: 28
      HURT stats (rel)   min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
      95% mean confidence interval for cycles value: -18.24 -15.49
      95% mean confidence interval for cycles %-change: -2.26% -2.14%
      Cycles are helped.
      
      total spills in shared programs: 12147 -> 12176 (0.24%)
      spills in affected programs: 175 -> 204 (16.57%)
      helped: 8
      HURT: 5
      
      total fills in shared programs: 25262 -> 25292 (0.12%)
      fills in affected programs: 269 -> 299 (11.15%)
      helped: 8
      HURT: 5
      
      Haswell
      total instructions in shared programs: 13530316 -> 13502647 (-0.20%)
      instructions in affected programs: 2507824 -> 2480155 (-1.10%)
      helped: 18859
      HURT: 10
      helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
      helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
      HURT stats (abs)   min: 5 max: 39 x̄: 25.70 x̃: 31
      HURT stats (rel)   min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
      95% mean confidence interval for instructions value: -1.49 -1.44
      95% mean confidence interval for instructions %-change: -2.42% -2.34%
      Instructions are helped.
      
      total cycles in shared programs: 377865412 -> 377639034 (-0.06%)
      cycles in affected programs: 40169572 -> 39943194 (-0.56%)
      helped: 15550
      HURT: 1938
      helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
      helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
      HURT stats (abs)   min: 1 max: 4862 x̄: 89.17 x̃: 35
      HURT stats (rel)   min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
      95% mean confidence interval for cycles value: -14.42 -11.47
      95% mean confidence interval for cycles %-change: -2.05% -1.91%
      Cycles are helped.
      
      total spills in shared programs: 26769 -> 26814 (0.17%)
      spills in affected programs: 826 -> 871 (5.45%)
      helped: 9
      HURT: 10
      
      total fills in shared programs: 38383 -> 38425 (0.11%)
      fills in affected programs: 834 -> 876 (5.04%)
      helped: 9
      HURT: 10
      
      LOST:   5
      GAINED: 10
      
      Ivy Bridge
      total instructions in shared programs: 12079250 -> 12044139 (-0.29%)
      instructions in affected programs: 2409680 -> 2374569 (-1.46%)
      helped: 16135
      HURT: 0
      helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
      helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
      95% mean confidence interval for instructions value: -2.21 -2.14
      95% mean confidence interval for instructions %-change: -2.76% -2.67%
      Instructions are helped.
      
      total cycles in shared programs: 180116747 -> 179900405 (-0.12%)
      cycles in affected programs: 25439823 -> 25223481 (-0.85%)
      helped: 13817
      HURT: 1499
      helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
      helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
      HURT stats (abs)   min: 1 max: 3684 x̄: 98.99 x̃: 52
      HURT stats (rel)   min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
      95% mean confidence interval for cycles value: -15.68 -12.57
      95% mean confidence interval for cycles %-change: -1.77% -1.63%
      Cycles are helped.
      
      LOST:   8
      GAINED: 10
      
      Sandy Bridge
      total instructions in shared programs: 10878990 -> 10863659 (-0.14%)
      instructions in affected programs: 1806702 -> 1791371 (-0.85%)
      helped: 13023
      HURT: 0
      helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
      helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
      95% mean confidence interval for instructions value: -1.18 -1.17
      95% mean confidence interval for instructions %-change: -1.68% -1.62%
      Instructions are helped.
      
      total cycles in shared programs: 154082878 -> 153862810 (-0.14%)
      cycles in affected programs: 20199374 -> 19979306 (-1.09%)
      helped: 12048
      HURT: 510
      helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
      helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
      HURT stats (abs)   min: 1 max: 448 x̄: 54.39 x̃: 16
      HURT stats (rel)   min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
      95% mean confidence interval for cycles value: -17.97 -17.08
      95% mean confidence interval for cycles %-change: -1.84% -1.75%
      Cycles are helped.
      
      LOST:   1
      GAINED: 0
      
      Iron Lake
      total instructions in shared programs: 8155075 -> 8142729 (-0.15%)
      instructions in affected programs: 949495 -> 937149 (-1.30%)
      helped: 5810
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
      helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
      95% mean confidence interval for instructions value: -2.14 -2.11
      95% mean confidence interval for instructions %-change: -2.59% -2.48%
      Instructions are helped.
      
      total cycles in shared programs: 188584610 -> 188549632 (-0.02%)
      cycles in affected programs: 17274446 -> 17239468 (-0.20%)
      helped: 3881
      HURT: 90
      helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
      helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
      HURT stats (abs)   min: 2 max: 10 x̄: 2.80 x̃: 2
      HURT stats (rel)   min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
      95% mean confidence interval for cycles value: -9.35 -8.27
      95% mean confidence interval for cycles %-change: -0.85% -0.77%
      Cycles are helped.
      
      GM45
      total instructions in shared programs: 5019308 -> 5013119 (-0.12%)
      instructions in affected programs: 489028 -> 482839 (-1.27%)
      helped: 2912
      HURT: 0
      helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
      helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
      95% mean confidence interval for instructions value: -2.14 -2.11
      95% mean confidence interval for instructions %-change: -2.54% -2.39%
      Instructions are helped.
      
      total cycles in shared programs: 129002592 -> 128977804 (-0.02%)
      cycles in affected programs: 12669152 -> 12644364 (-0.20%)
      helped: 2759
      HURT: 37
      helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
      helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
      HURT stats (abs)   min: 2 max: 10 x̄: 3.62 x̃: 4
      HURT stats (rel)   min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
      95% mean confidence interval for cycles value: -9.53 -8.20
      95% mean confidence interval for cycles %-change: -0.79% -0.70%
      Cycles are helped.
      Reviewed-by: Caio Marcelo de Oliveira Filho's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      0ba9497e
    • Ian Romanick's avatar
      intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu · a2887085
      Ian Romanick authored
      This is the same as the need_dest parameter to
      prepare_alu_destination_and_sources.  This allows us to not change the
      register that is expected to hold an result if an instruction is
      re-emitted.  This is particularly a problem if the re-emitted
      instruction is a partial write.  A later patch will use this feature.
      
      No shader-db changes on any Intel platform.
      
      v2: Don't do the Boolean resolve when there is no destination.  If the
      ALU instruction didn't write a register, there's nothing to resolve.
      This replaces an earlier patch "intel/fs: Allocate dummy destination
      register when need_dest is false".
      Reviewed-by: Caio Marcelo de Oliveira Filho's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      a2887085
    • Ian Romanick's avatar
      intel/fs: Allow cmod propagation across reads and writes of different flags · e13a5c7d
      Ian Romanick authored
      This also helps a later patch (intel/fs: Improve discard_if code
      generation) on about 200 shaders.
      
      v2: Document that other instruction sequences are also valid in
      subtract_merge_with_compare_intervening_mismatch_flag_write.  Suggested
      by Caio.
      
      All Intel platforms had similar results. (Ice Lake shown)
      total instructions in shared programs: 17224438 -> 17224434 (<.01%)
      instructions in affected programs: 296 -> 292 (-1.35%)
      helped: 4
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -2.04% -0.81%
      Instructions are helped.
      
      total cycles in shared programs: 361468455 -> 361468458 (<.01%)
      cycles in affected programs: 2862 -> 2865 (0.10%)
      helped: 2
      HURT: 2
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
      HURT stats (abs)   min: 3 max: 4 x̄: 3.50 x̃: 3
      HURT stats (rel)   min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
      95% mean confidence interval for cycles value: -4.34 5.84
      95% mean confidence interval for cycles %-change: -0.70% 0.90%
      Inconclusive result (value mean confidence interval includes 0).
      Reviewed-by: Caio Marcelo de Oliveira Filho's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      e13a5c7d
    • Ian Romanick's avatar
      intel/fs: Fix flag_subreg handling in cmod propagation · 8030cb75
      Ian Romanick authored
      There were two errors.  First, the pass could propagate conditional
      modifiers from an instruction that writes on flag register to an
      instruction that writes a different flag register.  For example,
      
          cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
          cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F
      
      could be come
      
          cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
      
      Second, if an instruction writes f0.1 has it's condition propagated, the
      modified instruction will incorrectly write flag f0.0.  For example,
      
          linterp(16) vgrf6:F, g2:F, attr0:F
          cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
          (-f0.1) discard_jump(16) (null):UD
      
      could become
      
          linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
          (-f0.1) discard_jump(16) (null):UD
      
      None of these cases will occur currently.  The only time we use f0.1 is
      for generating discard intrinsics.  In all those cases, we generate a
      squence like:
      
          cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
          (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
          (-f0.1) discard_jump(16) (null):UD
      
      Due to the mixed types and incompatible conditions, this sequence would
      never see any cmod propagation.  The next patch will change this.
      
      No shader-db changes on any Intel platform.
      
      v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
      Noticed by Caio.
      Reviewed-by: Caio Marcelo de Oliveira Filho's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      8030cb75
    • Ian Romanick's avatar
      intel/fs: Add missing tests for cmod_propagate_not · 2dd60139
      Ian Romanick authored
      Tests like this should have been added in 4467040c ("i965/fs:
      Propagate conditional modifiers from not instructions").
      Reviewed-by: Caio Marcelo de Oliveira Filho's avatarCaio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      2dd60139
  2. 05 Jun, 2019 35 commits