1. 23 Jul, 2018 19 commits
    • Timothy Arceri's avatar
      glsl: make a copy of array indices that are used to deref a function out param · 689eb68d
      Timothy Arceri authored
      V2: make use of visit_tree()
      
      Fixes new piglit test:
      tests/spec/glsl-1.20/execution/qualifiers/vs-out-conversion-int-to-float-vec4-index.shader_test
      689eb68d
    • Timothy Arceri's avatar
      nir: opt_algebraic add some opts to remove spilling in dirt showdown · 55bd40ce
      Timothy Arceri authored
      These appear in a single Dirt Showdown compute shader. With this
      VGPR spilling in the radeonsi NIR backend is reduced.
      55bd40ce
    • Timothy Arceri's avatar
      nir: match constant bools with @bool type · 30fd1dad
      Timothy Arceri authored
      For simplicity we only allow this to work for scalar types.
      30fd1dad
    • Timothy Arceri's avatar
      632241b0
    • Timothy Arceri's avatar
      wip safe late · 0b7f3f47
      Timothy Arceri authored
      0b7f3f47
    • Timothy Arceri's avatar
      wip stop calling lower_if_to_cond_assign() for nir drivers · b96807c2
      Timothy Arceri authored
      maybe check PIPE_SHADER_CAP_LOWER_IF_THRESHOLD != 0 in
      lower_if_to_cond_assign() instead. So we can set
      PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH to UINT_MAX for radeonsi
      and have it still work for the tgsi path.
      b96807c2
    • Timothy Arceri's avatar
      radv: make use of nir_lower_load_const_to_scalar() · c8eea804
      Timothy Arceri authored
      This allows NIR to CSE more operations. LLVM does this also so the
      impact is limited, however doing this in NIR allows other opts to
      make progress.
      
      I didn't have a large vkpipeline-db to run this against but for
      radeonsi enabling this allowed some loops in Civilization Beyond
      Earth shaders to unroll.
      c8eea804
    • Timothy Arceri's avatar
      nir: allow nir_intrinsic_load_ubo in opt_peephole_select · 2f80a54f
      Timothy Arceri authored
      This makes this opt behave more like the GLSL IR opt
      lower_if_to_cond_assign(). With this we can disable that GLSL IR
      opt on drivers with a NIR backend without causing spill
      regressions.
      
      shader-db results for radeonsi (RX580):
      
      Totals from affected shaders:
      SGPRS: 12200 -> 13072 (7.15 %)
      VGPRS: 13496 -> 11840 (-12.27 %)
      Spilled SGPRs: 285 -> 290 (1.75 %)
      Spilled VGPRs: 115 -> 0 (-100.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 116 -> 0 (-100.00 %) dwords per thread
      Code Size: 781304 -> 770168 (-1.43 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 1558 -> 1586 (1.80 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Cc: Eric Anholt <eric@anholt.net>
      2f80a54f
    • Timothy Arceri's avatar
      radv: call nir_opt_algebraic_before_ffma() · 3ea31f7a
      Timothy Arceri authored
      My vkpipeline-db database only has 3 games currently so maybe
      someone else should give this a run for better stats.
      
      Totals from affected shaders:
      SGPRS: 8976 -> 7880 (-12.21 %)
      VGPRS: 7756 -> 7748 (-0.10 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 696460 -> 696620 (0.02 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 378 -> 380 (0.53 %)
      Wait states: 0 -> 0 (0.00 %)
      3ea31f7a
    • Timothy Arceri's avatar
      nir: evaluate loop terminator ior use when false · a610ba21
      Timothy Arceri authored
      This allows some loops to unroll were they are guaranteed to
      exit after the first iteration. For example:
      
      	loop {
      		block block_1:
      		/* preds: block_0 block_13 */
      		vec1 32 ssa_85 = load_const (0x00000002 /* 0.000000 */)
      		vec1 32 ssa_86 = ieq ssa_48, ssa_85
      		vec1 32 ssa_87 = load_const (0x00000001 /* 0.000000 */)
      		vec1 32 ssa_88 = ieq ssa_48, ssa_87
      		vec1 32 ssa_89 = ior ssa_86, ssa_88
      		vec1 32 ssa_90 = ieq ssa_48, ssa_0
      		vec1 32 ssa_91 = ior ssa_89, ssa_90
      
      		/* succs: block_2 block_3 */
      		if ssa_86 {
      			block block_2:
      			/* preds: block_1 */
      			 ...
      			break
      			/* succs: block_14 */
      		} else {
      			block block_3:
      			/* preds: block_1 */
      			/* succs: block_4 */
      		}
      		block block_4:
      		/* preds: block_3 */
      		/* succs: block_5 block_6 */
      		if ssa_88 {
      			block block_5:
      			/* preds: block_4 */
      			 ...
      			break
      			/* succs: block_14 */
      		} else {
      			block block_6:
      			/* preds: block_4 */
      			/* succs: block_7 */
      		}
      		block block_7:
      		/* preds: block_6 */
      		/* succs: block_8 block_9 */
      		if ssa_90 {
      			block block_8:
      			/* preds: block_7 */
      			 ...
      			break
      			/* succs: block_14 */
      		} else {
      			block block_9:
      			/* preds: block_7 */
      			/* succs: block_10 */
      		}
      		block block_10:
      		/* preds: block_9 */
      		vec1 32 ssa_107 = inot ssa_91
      		/* succs: block_11 block_12 */
      		if ssa_107 {
      			block block_11:
      			/* preds: block_10 */
      			break
      			/* succs: block_14 */
      		} else {
      			block block_12:
      			/* preds: block_10 */
      			/* succs: block_13 */
      		}
      	}
      
      These loops have been seen in Bethesda games running over
      DXVK. There is a slight increase in VGPR use but removing
      the loops allows us to further optimise the code in
      future. For example many of the unrolled if-statements
      could now be merged as they apear in the shaders multiple
      times.
      
      vkpipeline results RADV (from a db of only 3 games):
      
      Totals from affected shaders:
      SGPRS: 10920 -> 10440 (-4.40 %)
      VGPRS: 6120 -> 6264 (2.35 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 369952 -> 356608 (-3.61 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 2040 -> 2040 (0.00 %)
      Wait states: 0 -> 0 (0.00 %)
      a610ba21
    • Timothy Arceri's avatar
      nir: evaluate loop terminator ior use · 1a453045
      Timothy Arceri authored
      Here we replace one side of the ior with NIR_TRUE if the src is a
      loop terminators condition that we know can only be true.
      
      No shader-db change.
      1a453045
    • Timothy Arceri's avatar
      nir: add loop unroll support for complex wrapper loops · e7f2c8c9
      Timothy Arceri authored
      In GLSL IR we cheat with switch statements and simply convert them
      into loops with a single iteration. This allowed us to make use of
      the existing jump instruction handling provided by the loop handing
      code, it also allows dead code to be cleaned up once we have
      wrapped the code in a loop.
      
      However using loops in this way created previously unrollable loops
      which limits further optimisations. Here we provide a way to unroll
      loops that end in a break and have multiple other exits.
      
      All shader-db changes are from the dolphin uber shaders. There is a
      small amount of HURT shaders but in general the improvements far
      exceed the HURT.
      
      shader-db results IVB:
      
      total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
      instructions in affected programs: 104080 -> 102361 (-1.65%)
      helped: 36
      HURT: 15
      
      total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
      cycles in affected programs: 126063017 -> 60527608 (-51.99%)
      helped: 51
      HURT: 0
      
      total loops in shared programs: 2515 -> 2308 (-8.23%)
      loops in affected programs: 903 -> 696 (-22.92%)
      helped: 51
      HURT: 0
      
      total spills in shared programs: 4370 -> 4124 (-5.63%)
      spills in affected programs: 1397 -> 1151 (-17.61%)
      helped: 9
      HURT: 12
      
      total fills in shared programs: 4581 -> 4419 (-3.54%)
      fills in affected programs: 2201 -> 2039 (-7.36%)
      helped: 9
      HURT: 15
      e7f2c8c9
    • Timothy Arceri's avatar
      nir: add loop unroll support for wrapper loops · e4a5db47
      Timothy Arceri authored
      This adds support for unrolling the classic
      
          do {
              // ...
          } while (false)
      
      that is used to wrap multi-line macros. GLSL IR also wraps switch
      statements in a loop like this.
      
      shader-db results IVB:
      
      total loops in shared programs: 2515 -> 2512 (-0.12%)
      loops in affected programs: 33 -> 30 (-9.09%)
      helped: 3
      HURT: 0
      e4a5db47
    • Timothy Arceri's avatar
      nir/opt_loop_unroll: Remove unneeded phis if we make progress · 1546b99e
      Timothy Arceri authored
      Now that SSA values can be derefs and they have special rules, we have
      to be a bit more careful about our LCSSA phis.  In particular, we need
      to clean up in case LCSSA ended up creating a phi node for a deref.
      This avoids validation issues with some CTS tests with the new patch,
      but its possible this we could also see the same problem with the
      existing unrolling passes.
      1546b99e
    • Timothy Arceri's avatar
      nir: add complex_loop bool to loop info · d344da10
      Timothy Arceri authored
      In order to be sure loop_terminator_list is an accurate
      representation of all the jumps in the loop we need to be sure we
      didn't encounter any other complex behaviour such as continues,
      nested breaks, etc during analysis.
      
      This will be used in the following patch.
      d344da10
    • Timothy Arceri's avatar
      nir: always attempt to find loop terminators · 3c810429
      Timothy Arceri authored
      This will help later patches with unrolling loops that end with a
      break i.e. loops the always exit on their first interation.
      3c810429
    • Timothy Arceri's avatar
      nir: allow more nested loops to be unrolled · 2865653c
      Timothy Arceri authored
      The innermost check was added to stop us from unrolling multiple
      loops in a single pass, and to stop outer loops from unrolling.
      
      When we successfully unroll a loop we need to run the analysis
      pass again before deciding if we want to go ahead an unroll a
      second loop.
      
      However the logic was flawed because it never tried to unroll any
      nested loops other than the first innermost loop it found.
      If this innermost loop is not unrolled we end up skipping all
      other nested loops.
      
      No change to shader-db. Unrolls a loop in a shader from the game
      Prey when running on DXVK.
      2865653c
    • Timothy Arceri's avatar
      nir: evaluate loop terminator condition uses · 18119e2f
      Timothy Arceri authored
      For simple loop terminators we can evaluate all further uses of the
      condition in the loop because we know we must have either exited
      the loop or we have a known value.
      
      shader-db results IVB (all changes from dolphin uber shaders):
      
      total instructions in shared programs: 10022822 -> 10018187 (-0.05%)
      instructions in affected programs: 115380 -> 110745 (-4.02%)
      helped: 54
      HURT: 0
      
      total cycles in shared programs: 232376154 -> 220065064 (-5.30%)
      cycles in affected programs: 143176202 -> 130865112 (-8.60%)
      helped: 54
      HURT: 0
      
      total spills in shared programs: 4383 -> 4370 (-0.30%)
      spills in affected programs: 1656 -> 1643 (-0.79%)
      helped: 9
      HURT: 18
      
      total fills in shared programs: 4610 -> 4581 (-0.63%)
      fills in affected programs: 374 -> 345 (-7.75%)
      helped: 6
      HURT: 0
      18119e2f
    • Timothy Arceri's avatar
      nir: evaluate if condition uses inside the if branches · f1638178
      Timothy Arceri authored
      Since we know what side of the branch we ended up on we can just
      replace the use with a constant.
      
      All helped shaders are from Unreal Engine 4 besides one shader from
      Dirt Showdown.
      
      V2: make sure we do evaluation when condition is used in else with
          a single block (we were checking for blocks < the last else
          block rather than <=)
      
      shader-db results SKL:
      
      total instructions in shared programs: 13219725 -> 13219643 (<.01%)
      instructions in affected programs: 28917 -> 28835 (-0.28%)
      helped: 45
      HURT: 0
      
      total cycles in shared programs: 529335971 -> 529334604 (<.01%)
      cycles in affected programs: 216209 -> 214842 (-0.63%)
      helped: 45
      HURT: 4
      
      Cc: Ian Romanick <idr@freedesktop.org>
      
      fix if condition eval for else with a single block
      f1638178
  2. 22 Jul, 2018 3 commits
  3. 21 Jul, 2018 2 commits
    • maurossi's avatar
      android: util/disk_cache: fix building errors in gallium drivers · 6cbbd5b4
      maurossi authored
      This patch applies the necessary changes in Android.common.mk
      as per automake rules, to avoid following building error:
      
      external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8:
      error: implicit declaration of function 'disk_cache_get_function_timestamp'
      is invalid in C99 [-Werror,-Wimplicit-function-declaration]
         if (disk_cache_get_function_timestamp(nouveau_disk_cache_create,
             ^
      1 error generated.
      
      (v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled
      
      Fixes: cc10b34e ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.")
      Signed-off-by: maurossi's avatarMauro Rossi <issor.oruam@gmail.com>
      Reviewed-by: Bas Nieuwenhuizen's avatarBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      6cbbd5b4
    • Chih-Wei Huang's avatar
      Android: fix a missing nir_intrinsics.h error · e7ffd3fb
      Chih-Wei Huang authored
      The commit 76dfed8a changed nir_intrinsics.h to be a generated
      header, but the corresponding dependency was not updated for Android.
      It causes the error:
      
      [  0% 19/4336] target  C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
      ...
      In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
      In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
      In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
      In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
      external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
               ^~~~~~~~~~~~~~~~~~
      1 error generated.
      
      Fixes: 76dfed8a ("nir: mako all the intrinsics")
      Signed-off-by: Chih-Wei Huang's avatarChih-Wei Huang <cwhuang@linux.org.tw>
      Reviewed-by: Tapani Pälli's avatarTapani Pälli <tapani.palli@intel.com>
      Reviewed-by: maurossi's avatarMauro Rossi <issor.oruam@gmail.com>
      e7ffd3fb
  4. 20 Jul, 2018 16 commits