1. 23 Jul, 2018 6 commits
    • Timothy Arceri's avatar
      nir/opt_loop_unroll: Remove unneeded phis if we make progress · 1546b99e
      Timothy Arceri authored
      Now that SSA values can be derefs and they have special rules, we have
      to be a bit more careful about our LCSSA phis.  In particular, we need
      to clean up in case LCSSA ended up creating a phi node for a deref.
      This avoids validation issues with some CTS tests with the new patch,
      but its possible this we could also see the same problem with the
      existing unrolling passes.
      1546b99e
    • Timothy Arceri's avatar
      nir: add complex_loop bool to loop info · d344da10
      Timothy Arceri authored
      In order to be sure loop_terminator_list is an accurate
      representation of all the jumps in the loop we need to be sure we
      didn't encounter any other complex behaviour such as continues,
      nested breaks, etc during analysis.
      
      This will be used in the following patch.
      d344da10
    • Timothy Arceri's avatar
      nir: always attempt to find loop terminators · 3c810429
      Timothy Arceri authored
      This will help later patches with unrolling loops that end with a
      break i.e. loops the always exit on their first interation.
      3c810429
    • Timothy Arceri's avatar
      nir: allow more nested loops to be unrolled · 2865653c
      Timothy Arceri authored
      The innermost check was added to stop us from unrolling multiple
      loops in a single pass, and to stop outer loops from unrolling.
      
      When we successfully unroll a loop we need to run the analysis
      pass again before deciding if we want to go ahead an unroll a
      second loop.
      
      However the logic was flawed because it never tried to unroll any
      nested loops other than the first innermost loop it found.
      If this innermost loop is not unrolled we end up skipping all
      other nested loops.
      
      No change to shader-db. Unrolls a loop in a shader from the game
      Prey when running on DXVK.
      2865653c
    • Timothy Arceri's avatar
      nir: evaluate loop terminator condition uses · 18119e2f
      Timothy Arceri authored
      For simple loop terminators we can evaluate all further uses of the
      condition in the loop because we know we must have either exited
      the loop or we have a known value.
      
      shader-db results IVB (all changes from dolphin uber shaders):
      
      total instructions in shared programs: 10022822 -> 10018187 (-0.05%)
      instructions in affected programs: 115380 -> 110745 (-4.02%)
      helped: 54
      HURT: 0
      
      total cycles in shared programs: 232376154 -> 220065064 (-5.30%)
      cycles in affected programs: 143176202 -> 130865112 (-8.60%)
      helped: 54
      HURT: 0
      
      total spills in shared programs: 4383 -> 4370 (-0.30%)
      spills in affected programs: 1656 -> 1643 (-0.79%)
      helped: 9
      HURT: 18
      
      total fills in shared programs: 4610 -> 4581 (-0.63%)
      fills in affected programs: 374 -> 345 (-7.75%)
      helped: 6
      HURT: 0
      18119e2f
    • Timothy Arceri's avatar
      nir: evaluate if condition uses inside the if branches · f1638178
      Timothy Arceri authored
      Since we know what side of the branch we ended up on we can just
      replace the use with a constant.
      
      All helped shaders are from Unreal Engine 4 besides one shader from
      Dirt Showdown.
      
      V2: make sure we do evaluation when condition is used in else with
          a single block (we were checking for blocks < the last else
          block rather than <=)
      
      shader-db results SKL:
      
      total instructions in shared programs: 13219725 -> 13219643 (<.01%)
      instructions in affected programs: 28917 -> 28835 (-0.28%)
      helped: 45
      HURT: 0
      
      total cycles in shared programs: 529335971 -> 529334604 (<.01%)
      cycles in affected programs: 216209 -> 214842 (-0.63%)
      helped: 45
      HURT: 4
      
      Cc: Ian Romanick <idr@freedesktop.org>
      
      fix if condition eval for else with a single block
      f1638178
  2. 22 Jul, 2018 3 commits
  3. 21 Jul, 2018 2 commits
    • maurossi's avatar
      android: util/disk_cache: fix building errors in gallium drivers · 6cbbd5b4
      maurossi authored
      This patch applies the necessary changes in Android.common.mk
      as per automake rules, to avoid following building error:
      
      external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8:
      error: implicit declaration of function 'disk_cache_get_function_timestamp'
      is invalid in C99 [-Werror,-Wimplicit-function-declaration]
         if (disk_cache_get_function_timestamp(nouveau_disk_cache_create,
             ^
      1 error generated.
      
      (v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled
      
      Fixes: cc10b34e ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.")
      Signed-off-by: maurossi's avatarMauro Rossi <issor.oruam@gmail.com>
      Reviewed-by: Bas Nieuwenhuizen's avatarBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      6cbbd5b4
    • Chih-Wei Huang's avatar
      Android: fix a missing nir_intrinsics.h error · e7ffd3fb
      Chih-Wei Huang authored
      The commit 76dfed8a changed nir_intrinsics.h to be a generated
      header, but the corresponding dependency was not updated for Android.
      It causes the error:
      
      [  0% 19/4336] target  C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
      ...
      In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
      In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
      In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
      In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
      external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
               ^~~~~~~~~~~~~~~~~~
      1 error generated.
      
      Fixes: 76dfed8a ("nir: mako all the intrinsics")
      Signed-off-by: Chih-Wei Huang's avatarChih-Wei Huang <cwhuang@linux.org.tw>
      Reviewed-by: Tapani Pälli's avatarTapani Pälli <tapani.palli@intel.com>
      Reviewed-by: maurossi's avatarMauro Rossi <issor.oruam@gmail.com>
      e7ffd3fb
  4. 20 Jul, 2018 26 commits
  5. 19 Jul, 2018 3 commits
    • Rhys Perry's avatar
      nv50/ir: move LateAlgebraicOpt back to right after ConstantFolding · 409a60df
      Rhys Perry authored
      total instructions in shared programs : 5480808 -> 5472107 (-0.16%)
      total gprs used in shared programs    : 647530 -> 647532 (0.00%)
      total shared used in shared programs  : 389120 -> 389120 (0.00%)
      total local used in shared programs   : 21064 -> 21064 (0.00%)
      total bytes used in shared programs   : 58551648 -> 58459352 (-0.16%)
      
                      local     shared        gpr       inst      bytes
          helped           0           0          73        2609        2609
            hurt           0           0          71          34          34
      409a60df
    • Rhys Perry's avatar
      nv50/ir: handle SHLADD in IndirectPropagation · 2afef231
      Rhys Perry authored
      An alternative solution to the problem fixed in
      0bd83d04 ("nv50/ir: move LateAlgebraicOpt to the very end").
      
      total instructions in shared programs : 5481195 -> 5480808 (-0.01%)
      total gprs used in shared programs    : 647535 -> 647530 (-0.00%)
      total shared used in shared programs  : 389120 -> 389120 (0.00%)
      total local used in shared programs   : 21064 -> 21064 (0.00%)
      total bytes used in shared programs   : 58555784 -> 58551648 (-0.01%)
      
                      local     shared        gpr       inst      bytes
          helped           0           0           2          34          34
            hurt           0           0           0           0           0
      2afef231
    • Rhys Perry's avatar
      gm107/ir: use CS2R for SV_CLOCK · 3b6edd0b
      Rhys Perry authored
      This instruction seems to be faster than S2R and requires no barrier,
      though the range of special registers it can read from is limited.
      Signed-off-by: Rhys Perry's avatarRhys Perry <pendingchaos02@gmail.com>
      Reviewed-by: Karol Herbst's avatarKarol Herbst <kherbst@redhat.com>
      3b6edd0b