Skip to content
Snippets Groups Projects
  1. Jun 26, 2023
  2. Jun 25, 2023
  3. Jun 24, 2023
  4. Jun 23, 2023
    • Francisco Jerez's avatar
      intel/gfx12.5: Enable L3 partial write merging for compressible surfaces among other cases. · 427fee35
      Francisco Jerez authored and Marge Bot's avatar Marge Bot committed
      
      This enables L3 partial write merging for a number of cases that seem
      to be getting accidentally disabled by the kernel, which was causing a
      serious performance bottleneck on DG2 and MTL platforms.  The
      "Compressible Partial Write Merge Enable", "Coherent Partial Write
      Merge Enable" and "Cross-Tile Partial Write Merge Enable" bits in
      L3SQCREG5 were expected to be enabled by default (and confusingly,
      they even read off as enabled if you ran 'intel_reg read 0xb158' on an
      idle system), but they are getting clobbered during 3D context
      initialization by an i915 workaround.
      
      Enabling L3 partial write merging of compressible surfaces in
      particular seems to increase rendering fillrate by over 3x in some
      cases (e.g. the
      "VulkanFillRate/FillRateGPU/resolution:1[0-3]/format:*/blend:0"
      fillrate-bound microbenchmarks).  Significant improvements can also be
      reproduced in most real-world workloads we've tested so far,
      e.g. Counter Strike GO improves by ~11%, Shadow Of the Tomb Raider
      improves by ~5.5%, and AztecRuins-VK improves by ~6.5% on DG2-512 --
      Thanks a lot to Caleb Callaway for these figures.  No regressions have
      been observed so far.
      
      Even though this patch might strike as surprisingly simple for such a
      large payoff, it's the result of Felix DeGrood and I trying to
      root-cause the rendering performance gap of DG2 on Linux vs Windows on
      and off during the last year, and some of the OA statistics captured
      by Felix early this month were greatly helpful for me to connect the
      last few dots, so Felix deserves a big chunk of the credit for this
      work.
      
      Cc: mesa-stable
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <mesa/mesa!23783>
      427fee35
    • David Heidelberg's avatar
      ci/fastboot: use gzipped Image to avoid compressing on the runner · d7ec6f17
      David Heidelberg authored and Marge Bot's avatar Marge Bot committed
      
      Faster download, one less step. Win-win.
      
      Signed-off-by: default avatarDavid Heidelberg <david.heidelberg@collabora.com>
      Part-of: <mesa/mesa!23816>
      d7ec6f17
    • Thong Thai's avatar
      frontends/va: fix some coverity scan reported issues · 7d3c29dc
      Thong Thai authored and Marge Bot's avatar Marge Bot committed
      
      Added some checks for NULL pointer dereferencing and loop bounds.
      v2: Use ARRAY_SIZE instead of magic numbers (@jenatali)
      
      Signed-off-by: default avatarThong Thai <thong.thai@amd.com>
      Reviewed-by: Jesse Natalie's avatarJesse Natalie <jenatali@microsoft.com>
      Part-of: <mesa/mesa!23598>
      7d3c29dc
    • Caio Oliveira's avatar
      meson: Explicitly add "check : false" to a couple instances of run_command · dc93f205
      Caio Oliveira authored and Marge Bot's avatar Marge Bot committed
      In both cases there's code right after the execution to check the result and
      give a proper message.
      
      This gets rid of meson warning
      
      ```
      WARNING: You should add the boolean check kwarg to the run_command call.
               It currently defaults to false,
               but it will default to true in future releases of meson.
               See also: https://github.com/mesonbuild/meson/issues/9300
      
      
      ```
      
      Reviewed-by: default avatarEric Engestrom <eric@igalia.com>
      Reviewed-by: default avatarYonggang Luo <luoyonggang@gmail.com>
      Part-of: <mesa/mesa!23821>
      dc93f205
    • Rhys Perry's avatar
      amd/drm-shim: use fixed-width types · d3e5e04a
      Rhys Perry authored and Marge Bot's avatar Marge Bot committed
      
      Signed-off-by: default avatarRhys Perry <pendingchaos02@gmail.com>
      Reviewed-by: default avatarEric Engestrom <eric@igalia.com>
      Closes: #9221
      Part-of: <!23725>
      d3e5e04a
    • Alyssa Rosenzweig's avatar
      agx: Implement vector live range splitting · 766535c8
      Alyssa Rosenzweig authored and Marge Bot's avatar Marge Bot committed
      
      The SSA killer feature is that, under an "optimal" allocator, the number of
      registers used (register demand) is *equal* to the number of registers required
      (register pressure, the maximum number of variables simultaneously live at any
      point in the program). I put "optimal" in scare quotes, because we don't need to
      use the exact minimum number of registers as long as we don't sacrifice thread
      count or introduce spilling, and using a few extra registers when possible can
      help coalesce moves. Details-shmetails.
      
      The problem is that, prior to this commit, our register allocator was not
      well-behaved in certain circumstances, and would require an arbitrarily large
      number of registers. In particular, since different variables have different
      sizes and require contiguous allocation, in large programs the register file may
      become fragmented, causing the RA to use arbitrarily many registers despite
      having lots of registers free.
      
      The solution is vector live range splitting. First, we calculate the register
      pressure (the minimum number of registers that it is theoretically possible to
      allocate successfully), and round up to the maximum number of registers we will
      actually use (to give some wiggle room to coalesce moves). Then, we will treat
      this maximum as a *bound*, requiring that we don't use more registers than
      chosen. In the event that register file fragmentation prevents us from finding a
      contiguous sequence of registers to allocate a variable, rather than giving up
      or using registers we don't have, we shuffle the register file around
      (defragmenting it) to make room for the new variable. That lets us use a
      few moves to avoid sacrificing thread count or introducing spilling, which is
      usually a great choice.
      
      Android GLES3.1 shader-db results are as expected: some noise / small
      regressions for instruction count, but a bunch of shaders with improved thread
      count. The massive increase in register demand may seem weird, but this is the
      RA doing exactly what it's supposed to: using more registers if and only if they
      would not hurt thread count. Notice that no programs whatsoever are hurt for
      thread count, which is the salient part.
      
         total instructions in shared programs: 1781473 -> 1781574 (<.01%)
         instructions in affected programs: 276268 -> 276369 (0.04%)
         helped: 1074
         HURT: 463
         Inconclusive result (value mean confidence interval includes 0).
      
         total bytes in shared programs: 12196640 -> 12201670 (0.04%)
         bytes in affected programs: 1987322 -> 1992352 (0.25%)
         helped: 1060
         HURT: 513
         Bytes are HURT.
      
         total halfregs in shared programs: 488755 -> 529651 (8.37%)
         halfregs in affected programs: 295651 -> 336547 (13.83%)
         helped: 358
         HURT: 9737
         Halfregs are HURT.
      
         total threads in shared programs: 18875008 -> 18885440 (0.06%)
         threads in affected programs: 64576 -> 75008 (16.15%)
         helped: 82
         HURT: 0
         Threads are helped.
      
      Signed-off-by: default avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      Part-of: <!23832>
      766535c8
    • Alyssa Rosenzweig's avatar
      agx/lower_parallel_copy: Lower 64-bit copies · 72e6b683
      Alyssa Rosenzweig authored and Marge Bot's avatar Marge Bot committed
      
      To 32-bit. This way we don't get into bad situations where we need to eg swap
      unaligned 64-bit values or something funny like that.
      
      Signed-off-by: default avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      Part-of: <!23832>
      72e6b683
    • Alyssa Rosenzweig's avatar
      agx: Validate predecessor information · bfdaab65
      Alyssa Rosenzweig authored and Marge Bot's avatar Marge Bot committed
      
      Including the new loop header? flag.
      
      Signed-off-by: default avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      Part-of: <!23832>
      bfdaab65
    • Alyssa Rosenzweig's avatar
      agx: Add loop header? flag · 923b9667
      Alyssa Rosenzweig authored and Marge Bot's avatar Marge Bot committed
      
      This is useful for deciding whether we need to fix up phis in RA.
      
      Signed-off-by: default avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      Part-of: <!23832>
      923b9667
    • Alyssa Rosenzweig's avatar
      agx: Recollect stored vectors at their use · a2dbe6b6
      Alyssa Rosenzweig authored and Marge Bot's avatar Marge Bot committed
      
      This is Timur's cheesy solution to split-hell.shader_test. Seems to work ok
      here.
      
      Before: 94 inst, 588 bytes, 165 halfregs, 1 threads, 0 loops, 0:0 spills:fills
      After: 63 inst, 454 bytes, 129 halfregs, 1 threads, 0 loops, 0:0 spills:fills
      
      On Android GLES3.1 shader-db, a few shaders are helped a lot:
      
         total instructions in shared programs: 1781706 -> 1781473 (-0.01%)
         instructions in affected programs: 4284 -> 4051 (-5.44%)
         helped: 16
         HURT: 2
         Instructions are helped.
      
         total bytes in shared programs: 12197854 -> 12196640 (<.01%)
         bytes in affected programs: 29526 -> 28312 (-4.11%)
         helped: 20
         HURT: 2
         Bytes are helped.
      
         total halfregs in shared programs: 489007 -> 488755 (-0.05%)
         halfregs in affected programs: 945 -> 693 (-26.67%)
         helped: 7
         HURT: 0
         Halfregs are helped.
      
         total threads in shared programs: 18873216 -> 18875008 (<.01%)
         threads in affected programs: 5376 -> 7168 (33.33%)
         helped: 7
         HURT: 0
         Threads are helped.
      
      Signed-off-by: default avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      Part-of: <!23832>
      a2dbe6b6
    • Alyssa Rosenzweig's avatar
      agx: Extract coordinate register size calculation · 91d98975
      Alyssa Rosenzweig authored and Marge Bot's avatar Marge Bot committed
      
      It will be used for image writes too, not just reads.
      
      Signed-off-by: default avatarAlyssa Rosenzweig <alyssa@rosenzweig.io>
      Part-of: <!23832>
      91d98975
Loading