1. 02 Mar, 2022 10 commits
    • Erik Faye-Lund's avatar
      docs: fix a broken link · 23d3fb6d
      Erik Faye-Lund authored and Marge Bot's avatar Marge Bot committed
      
      
      Acked-by: Alyssa Rosenzweig's avatarAlyssa Rosenzweig <alyssa@collabora.com>
      Part-of: <!15213>
      23d3fb6d
    • Lionel Landwerlin's avatar
      intel/fs: fix total_scratch computation · 96c88809
      Lionel Landwerlin authored and Marge Bot's avatar Marge Bot committed
      
      
      We only have a single prog_data::total_scratch for all shader variants
      (SIMD 8, 16, 32). Therefore we should always max the total_scratch on
      top of existing variant.
      
      We probably haven't run into that issue before because we compile by
      increasing SIMD size and higher SIMD size is more likely to spill. But
      for bindless shaders with return shaders, if the last return part
      doesn't spill, we completely ignore the previous parts' scratch
      computation.
      
      Signed-off-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: mesa-stable
      Reviewed-by: Tapani Pälli's avatarTapani Pälli <tapani.palli@intel.com>
      Part-of: <!15193>
      96c88809
    • Juan A. Suárez's avatar
      v3d: enable texture filtering anisotropic · 5b430758
      Juan A. Suárez authored and Marge Bot's avatar Marge Bot committed
      Seems we already had implemented this feature (see commit 521e1d02
      "broadcom/vc5: Add support for anisotropic filtering"), but we didn't
      enable the proper capability.
      
      Also update the maximum level of anistropy supported.
      
      Fixes: #4201
      
      
      Signed-off-by: Juan A. Suárez's avatarJuan A. Suarez Romero <jasuarez@igalia.com>
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Reviewed-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      Part-of: <!15180>
      5b430758
    • Caio Oliveira's avatar
      intel/compiler: Use pass helper in brw_nir_adjust_offset_for_arrayed_indices · dc77542e
      Caio Oliveira authored and Marge Bot's avatar Marge Bot committed
      
      
      Also change the code to preserve certain metadata: control flow is not changed
      so both block indices and dominance information is preserved.
      
      Reviewed-by: Marcin Ślusarz's avatarMarcin Ślusarz <marcin.slusarz@intel.com>
      Part-of: <!15206>
      dc77542e
    • Iago Toral's avatar
      broadcom/compiler: simplify node/temp translation during register allocation · f761f8fd
      Iago Toral authored and Marge Bot's avatar Marge Bot committed
      
      
      Now that we don't sort our nodes we can arrange them so we can
      easily translate between nodes and temps without a mapping table,
      just applying an offset.
      
      To do this we have a single array of nodes where twe put first the nodes
      for accumulators and then the nodes for temps. With this setup we can
      ensure that for any given temp T, its node is always T + ACC_COUNT.
      
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Part-of: <!15168>
      f761f8fd
    • Iago Toral's avatar
      broadcom/compiler: don't sort nodes for register allocation · 871b0a7f
      Iago Toral authored and Marge Bot's avatar Marge Bot committed
      Nodes are allocated in order to registers so initially sorting
      was used to ensure that nodes with smaller life ranges would
      be assigned first and therefore be more likely to get
      accumulators.
      
      However, since d81a6e5f
      
       now we don't rely on order to make
      decisions about accumulators and instead we make policy decisions
      based on actual liveness, so sorting is no longer strictly
      relevant to this decision.
      
      Furthermore, we are not re-sorting nodes after each spill either,
      since that would probably require that we rebuild the interference
      graph after each spill (the graph identifies nodes by their index).
      
      Shader-db results show a significant improvement in instruction
      counts, due to more optimal accumulator assignments. The reason for
      this is that we use a round-robin policy for choosing the next
      accumulator to assign. The idea behind this is preventing nearby
      temps to be assigned to the same accumulator so that QPU scheduling
      is more flexible, but if we  sort our nodes, we are basically not
      assigning temps in program order any more and the round-robin policy
      becomes less effective:
      
      total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
      instructions in affected programs: 11791267 -> 11454036 (-2.86%)
      helped: 62890
      HURT: 19987
      
      total threads in shared programs: 415874 -> 415870 (<.01%)
      threads in affected programs: 20 -> 16 (-20.00%)
      helped: 2
      HURT: 4
      
      total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
      uniforms in affected programs: 43430 -> 43402 (-0.06%)
      helped: 134
      HURT: 173
      
      total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
      max-temps in affected programs: 123334 -> 117280 (-4.91%)
      helped: 4112
      HURT: 1195
      
      total spills in shared programs: 3870 -> 3860 (-0.26%)
      spills in affected programs: 1013 -> 1003 (-0.99%)
      helped: 14
      HURT: 12
      
      total fills in shared programs: 5560 -> 5573 (0.23%)
      fills in affected programs: 1765 -> 1778 (0.74%)
      helped: 14
      HURT: 17
      
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Part-of: <!15168>
      871b0a7f
    • Iago Toral's avatar
      broadcom/compiler: sink uniform loads · 4483cd24
      Iago Toral authored and Marge Bot's avatar Marge Bot committed
      
      
      total instructions in shared programs: 13014428 -> 13000420 (-0.11%)
      instructions in affected programs: 743624 -> 729616 (-1.88%)
      helped: 1392
      HURT: 611
      
      total threads in shared programs: 415858 -> 415874 (<.01%)
      threads in affected programs: 16 -> 32 (100.00%)
      helped: 8
      HURT: 0
      
      total uniforms in shared programs: 3720410 -> 3711652 (-0.24%)
      uniforms in affected programs: 113442 -> 104684 (-7.72%)
      helped: 635
      HURT: 29
      
      total max-temps in shared programs: 2154268 -> 2144876 (-0.44%)
      max-temps in affected programs: 61279 -> 51887 (-15.33%)
      helped: 1124
      HURT: 187
      
      total spills in shared programs: 4002 -> 3870 (-3.30%)
      spills in affected programs: 265 -> 133 (-49.81%)
      helped: 6
      HURT: 0
      
      total fills in shared programs: 5788 -> 5560 (-3.94%)
      fills in affected programs: 603 -> 375 (-37.81%)
      helped: 6
      HURT: 0
      
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Part-of: <!15168>
      4483cd24
    • Iago Toral's avatar
      broadcom/compiler: move constants before their first user · e228642c
      Iago Toral authored and Marge Bot's avatar Marge Bot committed
      
      
      For us they are basically uniforms too so we want to make their
      lifespans short to facilitate allocating them to accumulators.
      
      total instructions in shared programs: 13043585 -> 13015385 (-0.22%)
      instructions in affected programs: 8326040 -> 8297840 (-0.34%)
      helped: 24939
      HURT: 19894
      
      total threads in shared programs: 415860 -> 415858 (<.01%)
      threads in affected programs: 4 -> 2 (-50.00%)
      helped: 0
      HURT: 1
      
      total uniforms in shared programs: 3721953 -> 3720451 (-0.04%)
      uniforms in affected programs: 96134 -> 94632 (-1.56%)
      helped: 744
      HURT: 435
      
      total max-temps in shared programs: 2173431 -> 2154260 (-0.88%)
      max-temps in affected programs: 264598 -> 245427 (-7.25%)
      helped: 10858
      HURT: 841
      
      total spills in shared programs: 4005 -> 4010 (0.12%)
      spills in affected programs: 700 -> 705 (0.71%)
      helped: 5
      HURT: 10
      
      total fills in shared programs: 5801 -> 5817 (0.28%)
      fills in affected programs: 1346 -> 1362 (1.19%)
      helped: 6
      HURT: 11
      
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Part-of: <!15168>
      e228642c
    • Iago Toral's avatar
      broadcom/compiler: disallow TMU spills if max tmu spills is 0 · a1998a9f
      Iago Toral authored and Marge Bot's avatar Marge Bot committed
      
      
      If we are compiling with a strategy that does not allow TMU spills
      we should not allow spilling anything that is not a uniform.
      Otherwise the RA cost/benefit algorithm may choose to spill a
      temp that is not uniform and that will cause us to immediately
      fail the strategy and fallback to the next one, even if we
      could've instead chosen to spill more uniforms to compile the
      program successfully with that strategy.
      
      Some relevant shader-db stats:
      
      total instructions in shared programs: 13040711 -> 13043585 (0.02%)
      instructions in affected programs: 234238 -> 237112 (1.23%)
      helped: 73
      HURT: 172
      
      total threads in shared programs: 415664 -> 415860 (0.05%)
      threads in affected programs: 196 -> 392 (100.00%)
      helped: 98
      HURT: 0
      
      total uniforms in shared programs: 3717266 -> 3721953 (0.13%)
      uniforms in affected programs: 12831 -> 17518 (36.53%)
      helped: 6
      HURT: 100
      
      total max-temps in shared programs: 2174177 -> 2173431 (-0.03%)
      max-temps in affected programs: 4597 -> 3851 (-16.23%)
      helped: 79
      HURT: 21
      
      total spills in shared programs: 4010 -> 4005 (-0.12%)
      spills in affected programs: 55 -> 50 (-9.09%)
      helped: 5
      HURT: 0
      
      total fills in shared programs: 5820 -> 5801 (-0.33%)
      fills in affected programs: 186 -> 167 (-10.22%)
      helped: 5
      HURT: 0
      
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Part-of: <!15168>
      a1998a9f
    • Iago Toral's avatar
      broadcom/compiler: increase cost of TMU spills to 10 · cbb4d0dd
      Iago Toral authored and Marge Bot's avatar Marge Bot committed
      
      
      Our cost was 5 which matches the number of instructions we have to
      add for a TMU spill (a fill is 4 instructions).
      
      Uniform spills on the other hand add an extra instruction for each
      fill and remove one instruction for the spill itself. These have
      a cost of 1.
      
      Therefore, if we have a single spill+fill, we end up with +9
      instructions if it is a TMU spill and +0 instructions with a uniform
      spill, so making the former only 5 times more costly is probably
      not a good idea, and this is without even considering the added
      latency of the TMU accesses.
      
      Relevant shader-db changes show this causes as a marginal instruction
      count increase in a few shaders but better thread counts and lower
      TMU spilling overall:
      
      total instructions in shared programs: 13037315 -> 13040711 (0.03%)
      instructions in affected programs: 370106 -> 373502 (0.92%)
      helped: 187
      HURT: 321
      
      total threads in shared programs: 415090 -> 415664 (0.14%)
      threads in affected programs: 574 -> 1148 (100.00%)
      helped: 287
      HURT: 0
      
      total uniforms in shared programs: 3706674 -> 3717266 (0.29%)
      uniforms in affected programs: 63075 -> 73667 (16.79%)
      helped: 40
      HURT: 395
      
      total max-temps in shared programs: 2176080 -> 2174177 (-0.09%)
      max-temps in affected programs: 15838 -> 13935 (-12.02%)
      helped: 316
      HURT: 34
      
      total spills in shared programs: 4247 -> 4010 (-5.58%)
      spills in affected programs: 2599 -> 2362 (-9.12%)
      helped: 107
      HURT: 14
      
      total fills in shared programs: 6121 -> 5820 (-4.92%)
      fills in affected programs: 3622 -> 3321 (-8.31%)
      helped: 108
      HURT: 13
      
      Reviewed-by: Alejandro Piñeiro's avatarAlejandro Piñeiro <apinheiro@igalia.com>
      Part-of: <!15168>
      cbb4d0dd
  2. 01 Mar, 2022 30 commits