1. 12 Mar, 2019 14 commits
    • Kenneth Graunke's avatar
      intel/fs: Fix opt_peephole_csel to not throw away saturates. · d9c4871b
      Kenneth Graunke authored
      We were not copying the saturate bit from the original instruction
      to the new replacement instruction.  This caused major misrendering
      in DiRT Rally on iris, where comparisons leading to discards failed
      due to the missing saturate, causing lots of extra garbage pixels to
      be drawn in text rendering, trees, and so on.
      
      This did not show up on i965 because st/nir performs a more aggressive
      version of nir_opt_peephole_select, yielding more b32csel operations.
      
      Fixes: 52c7df16 i965/fs: Merge CMP and SEL into CSEL on Gen8+
      d9c4871b
    • Juan A. Suárez's avatar
      anv: destroy descriptor sets when pool gets reset · 775aabdd
      Juan A. Suárez authored
      As stated in Vulkan spec:
         "Resetting a descriptor pool recycles all of the resources from all
          of the descriptor sets allocated from the descriptor pool back to
          the descriptor pool, and the descriptor sets are implicitly freed."
      
      This fixes dEQP-VK.api.descriptor_pool.*
      
      Fixes: 14f6275c
      
       "anv/descriptor_set: add reference counting for..."
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Tested-by: clayton craft's avatarClayton Craft <clayton.a.craft@intel.com>
      775aabdd
    • Timothy Arceri's avatar
      nir: find induction/limit vars in iand instructions · 3235a942
      Timothy Arceri authored
      
      
      This will be used to help find the trip count of loops that look
      like the following:
      
         while (a < x && i < 8) {
            ...
            i++;
         }
      
      Where the NIR will end up looking something like this:
      
         vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
         loop {
            ...
            vec1 1 ssa_12 = ilt ssa_225, ssa_11
            vec1 1 ssa_17 = ilt ssa_226, ssa_1
            vec1 1 ssa_18 = iand ssa_12, ssa_17
            vec1 1 ssa_19 = inot ssa_18
      
            if ssa_19 {
               ...
               break
            } else {
               ...
            }
         }
      
      On RADV this unrolls a bunch of loops in F1-2017 shaders.
      
      Totals from affected shaders:
      SGPRS: 4112 -> 4136 (0.58 %)
      VGPRS: 4132 -> 4052 (-1.94 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 515444 -> 587720 (14.02 %) bytes
      LDS: 2 -> 2 (0.00 %) blocks
      Max Waves: 194 -> 196 (1.03 %)
      Wait states: 0 -> 0 (0.00 %)
      
      It also unrolls a couple of loops in shader-db on radeonsi.
      
      Totals from affected shaders:
      SGPRS: 128 -> 128 (0.00 %)
      VGPRS: 64 -> 64 (0.00 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 6880 -> 9504 (38.14 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 16 -> 16 (0.00 %)
      Wait states: 0 -> 0 (0.00 %)
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      3235a942
    • Timothy Arceri's avatar
      nir: pass nir_op to calculate_iterations() · 67c34784
      Timothy Arceri authored
      
      
      Rather than getting this from the alu instruction this allows us
      some flexibility. In the following pass we instead pass the
      inverse op.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      67c34784
    • Timothy Arceri's avatar
      nir: add get_induction_and_limit_vars() helper to loop analysis · 11e8f8a1
      Timothy Arceri authored
      
      
      This helps make find_trip_count() a little easier to follow but
      will also be used by a following patch.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      11e8f8a1
    • Timothy Arceri's avatar
      nir: add helper to return inversion op of a comparison · f219f611
      Timothy Arceri authored
      
      
      This will be used to help find the trip count of loops that look
      like the following:
      
         while (a < x && i < 8) {
            ...
            i++;
         }
      
      Where the NIR will end up looking something like this:
      
         vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
         loop {
            ...
            vec1 1 ssa_12 = ilt ssa_225, ssa_11
            vec1 1 ssa_17 = ilt ssa_226, ssa_1
            vec1 1 ssa_18 = iand ssa_12, ssa_17
            vec1 1 ssa_19 = inot ssa_18
      
            if ssa_19 {
               ...
               break
            } else {
               ...
            }
         }
      
      So in order to find the trip count we need to find the inverse of
      ilt.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      f219f611
    • Timothy Arceri's avatar
      nir: simplify the loop analysis trip count code a little · 090feaac
      Timothy Arceri authored
      
      
      Here we create a helper is_supported_terminator_condition()
      and use that rather than embedding all the trip count code
      inside a switch.
      
      The new helper will also be used in a following patch.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      090feaac
    • Timothy Arceri's avatar
      nir: unroll some loops with a variable limit · 7571de8e
      Timothy Arceri authored
      
      
      For some loops can have a single terminator but the exact trip
      count is still unknown. For example:
      
         for (int i = 0; i < imin(x, 4); i++)
            ...
      
      Shader-db results radeonsi (all affected are from Tropico 5):
      
      Totals from affected shaders:
      SGPRS: 144 -> 152 (5.56 %)
      VGPRS: 124 -> 108 (-12.90 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 5180 -> 6640 (28.19 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 17 -> 21 (23.53 %)
      Wait states: 0 -> 0 (0.00 %)
      
      Shader-db results i965 (SKL):
      
      total loops in shared programs: 3808 -> 3802 (-0.16%)
      loops in affected programs: 6 -> 0
      helped: 6
      HURT: 0
      
      vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):
      
      Totals from affected shaders:
      SGPRS: 304 -> 304 (0.00 %)
      VGPRS: 296 -> 292 (-1.35 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 15756 -> 25884 (64.28 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 29 -> 29 (0.00 %)
      Wait states: 0 -> 0 (0.00 %)
      
      v2: fix bug where last iteration would get optimised away by
          mistake.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      7571de8e
    • Timothy Arceri's avatar
      nir: calculate trip count for more loops · 68ce0ec2
      Timothy Arceri authored
      
      
      This adds support to loop analysis for loops where the induction
      variable is compared to the result of min(variable, constant).
      
      For example:
      
         for (int i = 0; i < imin(x, 4); i++)
            ...
      
      We add a new bool to the loop terminator struct in order to
      differentiate terminators with this exit condition.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      68ce0ec2
    • Timothy Arceri's avatar
      nir: add partial loop unrolling support · e8a8937a
      Timothy Arceri authored
      
      
      This adds partial loop unrolling support and makes use of a
      guessed trip count based on array access.
      
      The code is written so that we could use partial unrolling
      more generally, but for now it's only use when we have guessed
      the trip count.
      
      We use partial unrolling for this guessed trip count because its
      possible any out of bounds array access doesn't otherwise affect
      the shader e.g the stores/loads to/from the array are unused. So
      we insert a copy of the loop in the innermost continue branch of
      the unrolled loop. Later on its possible for nir_opt_dead_cf()
      to then remove the loop in some cases.
      
      A Renderdoc capture from the Rise of the Tomb Raider benchmark,
      reports the following change in an affected compute shader:
      
      GPU duration: 350 -> 325 microseconds
      
      shader-db results radeonsi VEGA (NIR backend):
      
      SGPRS: 1008 -> 816 (-19.05 %)
      VGPRS: 684 -> 432 (-36.84 %)
      Spilled SGPRs: 539 -> 0 (-100.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 39708 -> 45812 (15.37 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 105 -> 144 (37.14 %)
      Wait states: 0 -> 0 (0.00 %)
      
      shader-db results i965 SKL:
      
      total instructions in shared programs: 13098265 -> 13103359 (0.04%)
      instructions in affected programs: 5126 -> 10220 (99.38%)
      helped: 0
      HURT: 21
      
      total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
      cycles in affected programs: 289252 -> 234925 (-18.78%)
      helped: 12
      HURT: 9
      
      vkpipeline-db results VEGA:
      
      Totals from affected shaders:
      SGPRS: 184 -> 184 (0.00 %)
      VGPRS: 448 -> 448 (0.00 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 26076 -> 24428 (-6.32 %) bytes
      LDS: 6 -> 6 (0.00 %) blocks
      Max Waves: 5 -> 5 (0.00 %)
      Wait states: 0 -> 0 (0.00 %)
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      e8a8937a
    • Timothy Arceri's avatar
      nir: add new partially_unrolled bool to nir_loop · fba5d275
      Timothy Arceri authored
      
      
      In order to stop continuously partially unrolling the same loop
      we add the bool partially_unrolled to nir_loop, we add it here
      rather than in nir_loop_info because nir_loop_info is only set
      via loop analysis and is intended to be cleared before each
      analysis. Also nir_loop_info is never cloned.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      fba5d275
    • Timothy Arceri's avatar
      nir: add guess trip count support to loop analysis · 03a452b7
      Timothy Arceri authored
      
      
      This detects an induction variable used as an array index to guess
      the trip count of the loop. This enables us to do a partial
      unroll of the loop, which can eventually result in the loop being
      eliminated.
      
      v2: check if the induction var is used to index more than a single
          array and if so get the size of the smallest array.
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      03a452b7
    • Tomeu Vizoso's avatar
    • Tomeu Vizoso's avatar
  2. 11 Mar, 2019 26 commits