1. 28 Jan, 2020 1 commit
  2. 24 Jan, 2020 6 commits
  3. 10 Jan, 2020 1 commit
  4. 21 Dec, 2019 2 commits
  5. 04 Dec, 2019 3 commits
  6. 02 Dec, 2019 1 commit
  7. 29 Nov, 2019 2 commits
  8. 25 Nov, 2019 6 commits
  9. 15 Nov, 2019 1 commit
  10. 14 Nov, 2019 1 commit
    • Timur Kristóf's avatar
      aco: Treat all booleans as per-lane. · 8995c0b3
      Timur Kristóf authored
      
      
      Previously, instruction selection had two kinds of booleans:
      1. divergent which was per-lane and stored in s2 (VCC size)
      2. uniform which was stored in s1
      Additionally, uniform booleans were made per-lane when they resulted
      from operations which were supported only by the VALU.
      
      To decide which type was used, we relied on the destination size,
      which was not reliable due to the per-lane uniform bools, but it
      mostly works on wave64.
      However, in wave32 mode (where VCC is also s1) this approach
      makes it impossible keep track of which boolean is uniform and
      which is divergent.
      
      This commit makes all booleans per-lane.
      The resulting excess code size will be taken care of by the optimizer.
      
      v2 (by Daniel Schürmann):
      - Better names for some functions
      - Use s_andn2_b64 with exec for nir_op_inot
      - Simplify code due to using s_and_b64 in bool_to_scalar_condition
      
      v3 (by Timur Kristóf):
      - Fix several subgroups regressions
      Signed-off-by: Timur Kristóf's avatarTimur Kristóf <timur.kristof@gmail.com>
      Reviewed-by: Rhys Perry's avatarRhys Perry <pendingchaos02@gmail.com>
      Reviewed-by: Daniel Schürmann's avatarDaniel Schürmann <daniel@schuermann.dev>
      8995c0b3
  11. 07 Nov, 2019 2 commits
  12. 06 Nov, 2019 1 commit
  13. 30 Oct, 2019 1 commit
  14. 28 Oct, 2019 1 commit
  15. 23 Oct, 2019 2 commits
    • Rhys Perry's avatar
      aco: take LDS into account when calculating num_waves · fc04a2fc
      Rhys Perry authored
      
      
      pipeline-db (Vega):
      SGPRS: 344 -> 344 (0.00 %)
      VGPRS: 424 -> 524 (23.58 %)
      Spilled SGPRs: 84 -> 80 (-4.76 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 52812 -> 52484 (-0.62 %) bytes
      LDS: 135 -> 135 (0.00 %) blocks
      Max Waves: 56 -> 53 (-5.36 %)
      
      v2: consider WGP, rework to be clearer and apply the
          "maximum 16 workgroups per CU" limit properly
      v2: use "SIMD" instead of "EU"
      v2: fix spiller by introducing "Program::max_waves"
      v2: rename "lds_size" to "lds_limit"
      v3: make max_waves actually independant of register usage
      v3: fix issue where max_waves was way too high
      v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1)
      v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp"
      v4: fix typo from "workgroups_per_cu" rename
      Signed-off-by: Rhys Perry's avatarRhys Perry <pendingchaos02@gmail.com>
      Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
      fc04a2fc
    • Rhys Perry's avatar
      aco: increase accuracy of SGPR limits · 08d51001
      Rhys Perry authored
      
      
      SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed
      number of SGPRs and has 106 addressable SGPRs.
      
      pipeline-db (Vega):
      SGPRS: 5912 -> 6232 (5.41 %)
      VGPRS: 1772 -> 1780 (0.45 %)
      Spilled SGPRs: 0 -> 0 (0.00 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 88228 -> 87904 (-0.37 %) bytes
      LDS: 0 -> 0 (0.00 %) blocks
      Max Waves: 559 -> 571 (2.15 %)
      
      piepline-db (Navi):
      SGPRS: 341256 -> 363384 (6.48 %)
      VGPRS: 171536 -> 170960 (-0.34 %)
      Spilled SGPRs: 832 -> 581 (-30.17 %)
      Spilled VGPRs: 0 -> 0 (0.00 %)
      Private memory VGPRs: 0 -> 0 (0.00 %)
      Scratch size: 0 -> 0 (0.00 %) dwords per thread
      Code Size: 14207332 -> 14190872 (-0.12 %) bytes
      LDS: 33 -> 33 (0.00 %) blocks
      Max Waves: 18072 -> 18251 (0.99 %)
      
      v2: unconditionally count vcc as an extra sgpr on GFX10+
      v3: pass SGPRs rounded to 8
      Signed-off-by: Rhys Perry's avatarRhys Perry <pendingchaos02@gmail.com>
      Reviewed-by: Daniel Schürmann's avatarDaniel Schürmann <daniel@schuermann.dev>
      08d51001
  16. 22 Oct, 2019 1 commit
  17. 21 Oct, 2019 3 commits
  18. 11 Oct, 2019 1 commit
  19. 10 Oct, 2019 3 commits
  20. 09 Oct, 2019 1 commit