1. 13 Nov, 2020 1 commit
    • Nanley Chery's avatar
      iris: Disable color fast-clears in iris_copy_region · 7779b1d7
      Nanley Chery authored
      During a blorp_copy between two color surfaces, the source and
      destination formats are re-interpreted to UINT (if possible) to avoid
      losing bits.
      If either surface has CCS_E, then extra steps are taken to support
      fast-cleared blocks with this format re-interpretation. Each clear value
      is packed in the original format, then unpacked in the new UINT format.
      This is then placed into the surface state object for some platforms.
      There are couple problems here:
      1. This is only being done for CCS_E, but MCS also supports fast-clears.
      2. These steps aren't enough for fast-clears on gen11+. On gen11+, the
         clear color isn't part of the surface state object that BLORP
         creates. Instead it's stored in a separate BO, that the surface state
         object references. Since that BO doesn't get updated during
         blorp_copy, the incorrect/unconverted clear color is used for the copy
      I didn't measure any performance gain from this code, so this patch
      simply disables the feature.
      Makes iris pass the nv_copy_image-simple piglit test on gen11+.
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Part-of: <mesa/mesa!5388>
  2. 05 Nov, 2020 1 commit
  3. 04 Nov, 2020 1 commit
  4. 19 Oct, 2020 1 commit
    • Kenneth Graunke's avatar
      isl, anv, iris: Add a centralized helper to select MOCS based on usage · 02fe825a
      Kenneth Graunke authored
      On Gen12+, we can enable additional caches in certain usage situations.
      This routes that decision making to a central place in ISL, based on
      surface usage flags, and updates both drivers to use it.  (i965 doesn't
      need to change because it doesn't support Gen12.)
      We continue handling the "external" decision via an anv_mocs() wrapper
      for now, since we store that flag in anv_bo, which isl doesn't know
      about.  (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
      actually sure that would be cleaner.)
      This patch should not have any functional nor performance effects, as
      we continue selecting the exact same MOCS values for now.
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Part-of: <mesa/mesa!7104>
  5. 08 Jul, 2020 1 commit
  6. 19 Jun, 2020 2 commits
  7. 03 Jun, 2020 3 commits
    • Francisco Jerez's avatar
      OPTIONAL: iris: Perform BLORP buffer barriers outside of iris_blorp_exec() hook. · 8252bb0e
      Francisco Jerez authored
      The iris_blorp_exec() hook needs to be executed under a single
      indivisible sync region, which means that in cases where we need to
      emit a PIPE_CONTROL for a buffer barrier we won't be able to track the
      subsequent commands separately from the previous commands, which will
      prevent us from optimizing out subsequent PIPE_CONTROLs if we
      encounter the same buffers again.  In particular I've encountered this
      situation in some SynMark test-cases which perform lots of BLORP
      operations with the same buffer bound as both source and destination
      (in order to generate mipmaps): In such a scenario if the source
      requires flushing we'd also end up flushing for the destination
      redundantly, even though a single PIPE_CONTROL would have been
      This avoids a 4.5% FPS regression in SynMark OglHdrBloom and a 3.5%
      FPS regression in SynMark OglMultithread.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <mesa/mesa!3875>
    • Francisco Jerez's avatar
      iris: Remove batch argument of iris_resource_prepare_access() and friends. · 8e8198f3
      Francisco Jerez authored
      The resolves performed by this function are only expected to work from
      the render batch, so make sure we use it independently of the batch
      the caller wants to use.  This function provides no synchronization
      guarantees anyway, the caller is expected to insert any cache flushing
      and synchronization required for the resolved surface to be visible to
      the target batch.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <mesa/mesa!3875>
    • Francisco Jerez's avatar
      iris: Bracket batch operations which access memory within sync regions. · e81c07de
      Francisco Jerez authored
      This delimits all batch operations which access memory between
      iris_batch_sync_region_start() and iris_batch_sync_region_end() calls.
      This makes sure that any buffer objects accessed within the region are
      considered in use through the same caching domain until the end of the
      Adding any buffer to the batch validation list outside of a sync
      region will lead to an assertion failure in a future commit, unless
      the caller explicitly opted out of the cache tracking mechanism.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <mesa/mesa!3875>
  8. 30 Apr, 2020 1 commit
  9. 29 Apr, 2020 1 commit
  10. 12 Mar, 2020 2 commits
  11. 05 Mar, 2020 3 commits
  12. 24 Feb, 2020 1 commit
  13. 22 Feb, 2020 1 commit
  14. 04 Jan, 2020 2 commits
  15. 14 Nov, 2019 1 commit
  16. 12 Nov, 2019 1 commit
  17. 29 Oct, 2019 2 commits
  18. 28 Oct, 2019 3 commits
  19. 08 Oct, 2019 1 commit
  20. 01 Sep, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Lessen texture cache hack flush for blits/copies on Icelake. · 87fa8d9e
      Kenneth Graunke authored
      Lionel found actual documentation for this at long last.  Apparently
      it actually is a sampler cache limitation that was mostly fixed on
      Icelake.  Unfortunately, it seems there are still issues with ASTC
      and non-ASTC sampler views.  Still, we can lessen the flush condition
      from "format mismatch" to "ASTC mismatch", which eliminates most of
      the flushing here.
      We also update the documentation to refer to the workaround name.
  21. 13 Aug, 2019 1 commit
  22. 01 Jul, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls. · 9b1b9714
      Kenneth Graunke authored
      If our resource_copy_region size is a small number of DWords, then
      instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
      a CS stall).  We also try and select the optimal batch.
      Improves performance in Shadow of Mordor on Low settings at 1920x1080
      on Skylake GT4e by 0.689096% +/- 0.473968% (n=4).  It tries to copy
      4 bytes of data to a buffer which was most recently used as a writable
      compute shader SSBO.  Previously we were switching from compute to the
      render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.
      I arbitrarily decided to support 4/8/12/16 bytes.  Jason thinks this
      is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
  23. 20 Jun, 2019 3 commits
    • Kenneth Graunke's avatar
      iris: Drop RT flushes from depth stencil clearing flushes. · ecc50039
      Kenneth Graunke authored
      These write depth and stencil, not color writes, so there's no need
      to flush the render target.
    • Kenneth Graunke's avatar
      iris: Avoid double flushing in iris_transfer_flush_region when copying. · 6890340c
      Kenneth Graunke authored
      My intention was to have iris_copy_region not do flushing, and leave
      that up to the callers.  iris_resource_copy_region needs to do this,
      but iris_transfer_flush_region was already doing it.  The net result
      was that we were doing it twice for transfers.
      So, move the flushing from iris_copy_region to iris_resource_copy_region
      so that it only happens in the callers as I intended.
    • Kenneth Graunke's avatar
      iris: Implement INTEL_DEBUG=pc for pipe control logging. · d4a4384b
      Kenneth Graunke authored
      This prints a log of every PIPE_CONTROL flush we emit, noting which bits
      were set, and also the reason for the flush.  That way we can see which
      are caused by hardware workarounds, render-to-texture, buffer updates,
      and so on.  It should make it easier to determine whether we're doing
      too many flushes and why.
  24. 17 Jun, 2019 3 commits
    • Kenneth Graunke's avatar
      iris: Make resource_copy_region handle packed depth-stencil resources. · 659d4f61
      Kenneth Graunke authored
      Also copy along the separate stencil buffer if needed.
      Fixes Piglit's arb_copy_image-formats.
    • Kenneth Graunke's avatar
      iris: Order CS stall and TC invalidate for format reinterpretation hacks · a36f1542
      Kenneth Graunke authored
      This should ensure the TC invalidate happens after the stall.
      Fixes KHR-GL43.copy_image.functional which does a CopyImage (blorp_copy)
      from a buffer (using R8G8B8A8_UINT), then GetTexImage to read back the
      original image (using R10G10B10A2_UNORM).
    • Kenneth Graunke's avatar
      iris: Be more aggressive at post-format-reintepret TC invalidate hack · 94b9f50e
      Kenneth Graunke authored
      When copying/blitting with format reinterpretation, we invalidate the
      texture cache before/after.  Before is so the source of the copy works,
      and after is to get rid of our new data in the "wrong" format to protect
      future attempts to sample.
      When I ported these hacks to iris, I tried to be cautious by only
      bothering with the hacks if the batch referenced the BO.  This makes
      some sense for the before case.  If it isn't referenced, the texture
      cache can't really have any data for the BO (since it's also invalidated
      between batches).  But we still need to do the after case regardless,
      as we've just polluted the cache with hazardous entries.
  25. 07 May, 2019 1 commit
  26. 23 Apr, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Track valid data range and infer unsynchronized mappings. · 77449d7c
      Kenneth Graunke authored
      Applications frequently call glBufferSubData() to consecutive regions
      of a VBO to append new vertex data.  If no data exists there yet, we
      can promote these to unsynchronized writes, even if the buffer is busy,
      since the GPU can't be doing anything useful with undefined content.
      This can avoid a bunch of unnecessary blitting on the GPU.
      u_threaded_context would do this for us, and in fact prohibits us from
      doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED).  But we haven't
      hooked that up yet, and it may be useful to disable u_threaded_context
      when debugging...at which point we'd still want this optimization.  At
      the very least, it would let us measure the benefit of threading
      independently from this optimization.  And it's not a lot of code.
      Removes most stall avoidance blits in "Total War: WARHAMMER."
      On my Skylake GT4e at 1920x1080, this appears to improve performance
      in games by the following (but I did not do many runs for proper
      statistics gathering):
         | DiRT Rally        | +2% (avg) | + 2% (max) |
         | Bioshock Infinite | +3% (avg) | + 9% (max) |
         | Shadow of Mordor  | +7% (avg) | +20% (max) |