1. 06 Nov, 2019 1 commit
  2. 04 Nov, 2019 3 commits
  3. 24 Oct, 2019 3 commits
    • Caio Marcelo de Oliveira Filho's avatar
      nir: Add scoped_memory_barrier intrinsic · 73572abc
      Caio Marcelo de Oliveira Filho authored
      Add a NIR instrinsic that represent a memory barrier in SPIR-V /
      Vulkan Memory Model, with extra attributes that describe the barrier:
      
      - Ordering: whether is an Acquire or Release;
      - "Cache control": availability ("ensure this gets written in the memory")
        and visibility ("ensure my cache is up to date when I'm reading");
      - Variable modes: which memory types this barrier applies to;
      - Scope: how far this barrier applies.
      
      Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory
      Semantics" are split into two different attributes so we can use
      variable modes for the former.
      
      NIR passes that took barriers in consideration were also changed
      
      - nir_opt_copy_prop_vars: clean up the values for the mode of an
        ACQUIRE barrier.  Copy propagation effect is to "pull up a load" (by
        not performing it), which is what ACQUIRE restricts.
      
      - nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the
        pending writes for the modes of an RELEASE barrier.  Dead writes
        effect is to "push down a store", which is what RELEASE restricts.
      
      - nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for
        the modes.  This is conservative, but since this is a GL-specific
        pass, doesn't make a difference for now.
      
      v2: Fix the scoped barrier handling in copy propagation.  (Jason)
          Add scoped barrier handling to nir_opt_access and
          nir_opt_combine_writes.  (Rhys)
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: Bas Nieuwenhuizen's avatarBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      73572abc
    • Timothy Arceri's avatar
      nir: improve nir_variable packing · 922801b7
      Timothy Arceri authored
      Before:
      
      /* size: 136, cachelines: 3, members: 10 */
      
      After:
      
      /* size: 128, cachelines: 2, members: 10 */
      Reviewed-by: default avatarMarek Olšák <marek.olsak@amd.com>
      Reviewed-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      922801b7
    • Timothy Arceri's avatar
      nir: fix nir_variable_data packing · c412ff42
      Timothy Arceri authored
      Before:
      
      /* size: 60, cachelines: 1, members: 29 */
      
      After:
      
      /* size: 56, cachelines: 1, members: 29 */
      Reviewed-by: default avatarMarek Olšák <marek.olsak@amd.com>
      Reviewed-by: Rob Clark's avatarRob Clark <robdclark@chromium.org>
      c412ff42
  4. 21 Oct, 2019 1 commit
  5. 18 Oct, 2019 3 commits
  6. 17 Oct, 2019 6 commits
  7. 10 Oct, 2019 2 commits
  8. 17 Sep, 2019 1 commit
  9. 06 Sep, 2019 3 commits
  10. 22 Aug, 2019 1 commit
  11. 20 Aug, 2019 4 commits
  12. 19 Aug, 2019 1 commit
  13. 13 Aug, 2019 1 commit
    • Iago Toral's avatar
      nir: add a pass to clamp gl_PointSize to a range · 48f5c343
      Iago Toral authored
      The OpenGL and OpenGL ES specs require that implementations clamp the
      value of gl_PointSize to an implementation-depedent range. This pass
      is useful for any GPU hardware that doesn't do this automatically
      for either one or both sides of the range, such as V3D.
      
      v2:
       - Turn into a generic NIR pass (Eric).
       - Make the pass work before lower I/O so we can use the deref variable
         to inspect if we are writing to gl_PointSize (Eric).
       - Make the pass take the range to clamp as parameter and allow it
         to clamp to both sides of the range or just one side.
       - Make the pass report progress.
      
      v3:
       - Fix copyright header (Eric)
       - use fmin/fmax instead of bcsel to clamp (Eric)
      Reviewed-by: Eric Anholt's avatarEric Anholt <eric@anholt.net>
      48f5c343
  14. 12 Aug, 2019 2 commits
  15. 08 Aug, 2019 1 commit
    • Rhys Perry's avatar
      nir: add nir_lower_to_explicit() · fd73ed1b
      Rhys Perry authored
      v2: use glsl_type_size_align_func
      v2: move get_explicit_type() to glsl_types.cpp/nir_types.cpp
      v2: use align() instead of util_align_npot()
      v2: pack arrays a bit tighter
      v2: rename mem_* to field_*
      v2: don't attempt to handle when struct offsets are already set
      v2: use column_type() instead of recreating it
      v2: use a branch instead of |= in nir_lower_to_explicit_impl()
      v2: assign locations to variables and update shared_size and num_shared
      v2: allow the pass to be used with nir_var_{shader_temp,function_temp}
      v4: rebase
      v5: add TODO
      v5: small formatting changes
      v5: remove incorrect assert in get_explicit_type()
      v5: rename to nir_lower_vars_to_explicit_types
      v5: correctly update progress when only variables are updated
      v5: rename get_explicit_type() to get_explicit_shared_type()
      v5: add comment explaining how get_explicit_shared_type() is different
      v5: update cast strides
      v6: update progress when lowering nir_var_function_temp variables
      v6: formatting changes
      v6: add more detailed documentation comment for get_explicit_shared_type
      v6: rename get_explicit_shared_type to get_explicit_type_for_size_align
      v7: fix comment in nir_lower_vars_to_explicit_types_impl()
      Signed-off-by: Rhys Perry's avatarRhys Perry <pendingchaos02@gmail.com>
      Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      fd73ed1b
  16. 31 Jul, 2019 4 commits
  17. 29 Jul, 2019 1 commit
    • Connor Abbott's avatar
      nir/find_array_copies: Handle wildcards and overlapping copies · 156306e5
      Connor Abbott authored
      This commit rewrites opt_find_array_copies to be able to handle
      an array copy sequence with other intervening operations in between. In
      particular, this handles the case where we OpLoad an array of structs
      and then OpStore it, which generates code like:
      
      foo[0].a = bar[0].a
      foo[0].b = bar[0].b
      foo[1].a = bar[1].a
      foo[1].b = bar[1].b
      ...
      
      that wasn't recognized by the previous pass.
      
      In order to correctly handle copying arrays of arrays, and in particular
      to correctly handle copies involving wildcards, we need to use a tree
      structure similar to lower_vars_to_ssa so that we can walk all the
      partial array copies invalidated by a particular write, including
      ones where one of the common indices is a wildcard. I actually think
      that when factoring in the needed hashing/comparing code, a hash table
      based approach wouldn't be a lot smaller anyways.
      
      All of the changes come from tessellation control shaders in Strange
      Brigade, where we're able to remove the DXVK-inserted copy at the
      beginning of the shader. These are the result for radv:
      
      Totals from affected shaders:
      SGPRS: 4576 -> 4576 (0.00 %)
      VGPRS: 13784 -> 5560 (-59.66 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
      Code Size: 329940 -> 263268 (-20.21 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 330 -> 898 (172.12 %)
      Wait states: 0 -> 0 (0.00 %)
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      156306e5
  18. 24 Jul, 2019 2 commits