1. 19 Oct, 2020 1 commit
    • Kenneth Graunke's avatar
      isl, anv, iris: Add a centralized helper to select MOCS based on usage · 02fe825a
      Kenneth Graunke authored
      On Gen12+, we can enable additional caches in certain usage situations.
      This routes that decision making to a central place in ISL, based on
      surface usage flags, and updates both drivers to use it.  (i965 doesn't
      need to change because it doesn't support Gen12.)
      We continue handling the "external" decision via an anv_mocs() wrapper
      for now, since we store that flag in anv_bo, which isl doesn't know
      about.  (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
      actually sure that would be cleaner.)
      This patch should not have any functional nor performance effects, as
      we continue selecting the exact same MOCS values for now.
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Part-of: <!7104>
  2. 23 Jun, 2020 1 commit
  3. 03 Jun, 2020 6 commits
    • Francisco Jerez's avatar
      OPTIONAL: iris: Perform BLORP buffer barriers outside of iris_blorp_exec() hook. · 8252bb0e
      Francisco Jerez authored
      The iris_blorp_exec() hook needs to be executed under a single
      indivisible sync region, which means that in cases where we need to
      emit a PIPE_CONTROL for a buffer barrier we won't be able to track the
      subsequent commands separately from the previous commands, which will
      prevent us from optimizing out subsequent PIPE_CONTROLs if we
      encounter the same buffers again.  In particular I've encountered this
      situation in some SynMark test-cases which perform lots of BLORP
      operations with the same buffer bound as both source and destination
      (in order to generate mipmaps): In such a scenario if the source
      requires flushing we'd also end up flushing for the destination
      redundantly, even though a single PIPE_CONTROL would have been
      This avoids a 4.5% FPS regression in SynMark OglHdrBloom and a 3.5%
      FPS regression in SynMark OglMultithread.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
    • Francisco Jerez's avatar
      iris: Open-code iris_cache_flush_for_read() and iris_cache_flush_for_depth(). · b9281884
      Francisco Jerez authored
      These have become one-liners now so they can be easily inlined.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
    • Francisco Jerez's avatar
      iris: Remove render cache hash table-based synchronization. · 74c774dc
      Francisco Jerez authored
      The render cache hash table is now *mostly* redundant with the more
      general seqno matrix-based cache tracking mechanism.  Most hash table
      operations are now gone except for the format mismatch checks done in
      iris_cache_flush_for_render().  Redundant code removed as a separate
      patch for bisectability.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
    • Francisco Jerez's avatar
      iris: Remove depth cache set tracking and synchronization. · aa78d05a
      Francisco Jerez authored
      The depth cache set is now redundant with the more general seqno
      matrix-based cache tracking mechanism.  Removed as a separate patch
      for bisectability.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
    • Francisco Jerez's avatar
      iris: Annotate all BO uses with domain and sequence number information. · eb5d1c27
      Francisco Jerez authored
      Probably the most annoying patch to review from the whole series --
      Mark every buffer object use as accessed through some caching domain
      with the sequence number of the current synchronization section of the
      batch.  The additional argument of iris_use_pinned_bo() makes sure I'd
      have gotten a compile error if I had missed any buffer added to the
      batch validation list.
      There are only a few exceptions where a buffer is left untracked while
      adding it to the validation list, justified below:
       - Batch buffers: These are strictly read-only for the moment.
       - BLORP buffer objects: Their seqnos are bumped manually at the end
         of iris_blorp_exec() instead, in order to avoid plumbing domain
         information through BLORP address combining.
       - Scratch buffers: The contents of these are strictly thread-local.
       - Shader images and SSBOs: Accesses of these buffers are explicitly
         synchronized at the API level.
      v2: Opt out of tracking more aggressively (Ken): In addition to the
          above, surface states, binding tables, instructions and most
          dynamic states are now left untracked, which means a *lot* more BO
          uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
          carefully, since the cache tracker won't be able to provide any
          coherency guarantees for them.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
    • Francisco Jerez's avatar
      iris: Extend iris_context dirty state flags to 128 bits. · 46183a99
      Francisco Jerez authored
      We're nearly out of dirty bits, and some patches pending review on
      GitLab no longer apply due to that.  Make room for them by splitting
      off shader stage-specific bits into a separate stage_dirty mask.
      An alternative would be to split compute-related bits into a separate
      mask, but that would prevent the '<< stage' indexing done in various
      parts of the driver from working.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!5279>
  4. 20 May, 2020 2 commits
  5. 29 Apr, 2020 1 commit
  6. 02 Mar, 2020 1 commit
  7. 22 Feb, 2020 1 commit
  8. 31 Jan, 2020 1 commit
    • Jason Ekstrand's avatar
      intel/blorp: Always emit URB config on Gen7+ · 09e4c330
      Jason Ekstrand authored
      Previously, i965/iris tried to reuse the currently programmed URB config
      if it was good enough for BLORP, rather than reprogramming it each time.
      However, this will make some things harder on Gen12+ and we've not seen
      any performance impact from emitting URB more frequently in ANV.
      This makes the blorp <-> driver interface a bit simpler on Gen7+ because
      now all the driver has to do is to provide the L3$ config rather than
      trying to hand off URB re-config to blorp.
      Cc: "20.0" mesa-stable@lists.freedesktop.org
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3454>
  9. 05 Dec, 2019 1 commit
  10. 26 Nov, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Disable VF cache partial address workaround on Gen11+ · 3fdf2bb3
      Kenneth Graunke authored
      The vertex cache uses the full 48-bit address on Gen11+.  See the
      documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
      workaround and lists it as pre-Icelake.
      Interestingly, the docs don't mention index buffers as needing a
      workaround at all.  So either we've been overzealous, or the docs
      never got updated to record that.  Which begs the question of whether
      the issue there was fixed, if there was one...
      Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
      that it improves performance by about 1-2% on Icelake 8x8 (not frequency
  11. 28 Oct, 2019 1 commit
  12. 09 Oct, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Implement the Broadwell NP Z PMA Stall Fix · 0b7ecfdd
      Kenneth Graunke authored
      This should help avoid stalls in the pixel mask array in certain
      non-promoted depth cases.  It especially helps for Z16, as each bit
      in the PMA corresponds to two pixels when using Z16, as opposed to
      the usual one pixel.
      Improves performance in GFXBench5 TRex by 22% (n=1).
  13. 19 Sep, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Skip double-disabling TCS/TES/GS after BLORP operations · 706c9f2d
      Kenneth Graunke authored
      BLORP always turns off TCS/TES/GS.  If regular drawing also has them
      disabled (the overwhelmingly common case), then leaving them disabled
      is just fine by us and we can skip dirtying them, as that would just
      re-disable them a second time on the next draw.
      If they are actually enabled, however, we do need to flag them.
      Cuts 52% of the 3DSTATE_HS packets in an Aztec Ruins trace.
  14. 09 Sep, 2019 1 commit
  15. 12 Aug, 2019 1 commit
  16. 04 Jul, 2019 1 commit
  17. 22 Jun, 2019 2 commits
  18. 20 Jun, 2019 2 commits
    • Kenneth Graunke's avatar
      iris: Don't check VF address high bits when there is no buffer. · db8f57a5
      Kenneth Graunke authored
      If there is no buffer, then it doesn't matter.  Leave the old stale
      high bits in place (for next time) and don't bother invalidating.
      Cuts 5.6% of the flushes in the Civilization VI demo on Kabylake GT2.
    • Kenneth Graunke's avatar
      iris: Implement INTEL_DEBUG=pc for pipe control logging. · d4a4384b
      Kenneth Graunke authored
      This prints a log of every PIPE_CONTROL flush we emit, noting which bits
      were set, and also the reason for the flush.  That way we can see which
      are caused by hardware workarounds, render-to-texture, buffer updates,
      and so on.  It should make it easier to determine whether we're doing
      too many flushes and why.
  19. 27 May, 2019 1 commit
  20. 23 May, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Record state sizes for INTEL_DEBUG=bat decoding. · 7d2b54e3
      Kenneth Graunke authored
      Felix noticed a crash when using INTEL_DEBUG=bat decoding.  It turned
      out that we were sometimes placing variable length data near the end
      of a buffer, and with the decoder guessing random lengths rather than
      having an actual count, it was walking off the end and crashing.  So
      this does more than improve the decoder output.
      Unfortunately, this is a bit more complicated than i965's handling,
      because we don't have a single state buffer.  Various places upload
      data via u_upload_mgr, and so there isn't a central place to record
      the size.  We don't need to catch every single place, however, since
      it's only important to record variable length packets (like viewports
      and binding tables).
      State data also lives arbitrarily long, rather than being discarded on
      every batch like i965, so we don't know when to clear out old entries
      either.  (We also don't have a callback when an upload buffer is
      released.)  So, this tracking may space leak over time.  That's probably
      okay though, as this is only a debugging feature and it's a slow leak.
      We may also get lucky and overwrite existing entries as we reuse BOs,
      though I find this unlikely to happen.
      The fact that the decoder works in terms of offsets from a state base
      address is also not ideal, as dynamic state base address and surface
      state base address differ for iris.  However, because dynamic state
      addresses start from the top of a 4GB region, and binding tables start
      from addresses [0, 64K), it's highly unlikely that we'll get overlap.
      We can always improve this, but for now it's better than what we had.
  21. 12 Mar, 2019 1 commit
  22. 08 Mar, 2019 2 commits
  23. 21 Feb, 2019 9 commits