1. 19 Oct, 2020 1 commit
    • Kenneth Graunke's avatar
      isl, anv, iris: Add a centralized helper to select MOCS based on usage · 02fe825a
      Kenneth Graunke authored
      
      
      On Gen12+, we can enable additional caches in certain usage situations.
      This routes that decision making to a central place in ISL, based on
      surface usage flags, and updates both drivers to use it.  (i965 doesn't
      need to change because it doesn't support Gen12.)
      
      We continue handling the "external" decision via an anv_mocs() wrapper
      for now, since we store that flag in anv_bo, which isl doesn't know
      about.  (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
      actually sure that would be cleaner.)
      
      This patch should not have any functional nor performance effects, as
      we continue selecting the exact same MOCS values for now.
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Part-of: <!7104>
      02fe825a
  2. 08 Jul, 2020 1 commit
  3. 19 Jun, 2020 2 commits
    • Nanley Chery's avatar
      iris: Use ISL_AUX_USAGE_GEN12_CCS_E on gen12 · 9dea3e1b
      Nanley Chery authored
      
      
      Makes iris pass a subtest of the fcc-write-after-clear piglit test
      (fast-clear tracking across layers 1 -> 0 -> 1) on gen12.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!5363>
      9dea3e1b
    • Nanley Chery's avatar
      iris: Disable sRGB fast-clears for non-0/1 values · f8961ea0
      Nanley Chery authored
      
      
      For texturing and draw calls, HW expects the clear color to be in two
      different color spaces after sRGB fast-clears - sRGB in the former and
      linear in the latter. Up until now, iris has stored the clear color in
      the sRGB color space. Limit the allowable clear colors for sRGB
      fast-clears to 0/1 so that both color space requirements are satisfied.
      
      Makes iris pass the sRGB -> sRGB subtest of the fcc-write-after-clear
      piglit test on gen9+.
      
      v2:
      * Drop iris_context::blend_enables. (Ken)
      * Drop some more resolve-related blend-state-tracking code.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!4972>
      f8961ea0
  4. 03 Jun, 2020 3 commits
    • Francisco Jerez's avatar
      OPTIONAL: iris: Perform BLORP buffer barriers outside of iris_blorp_exec() hook. · 8252bb0e
      Francisco Jerez authored
      
      
      The iris_blorp_exec() hook needs to be executed under a single
      indivisible sync region, which means that in cases where we need to
      emit a PIPE_CONTROL for a buffer barrier we won't be able to track the
      subsequent commands separately from the previous commands, which will
      prevent us from optimizing out subsequent PIPE_CONTROLs if we
      encounter the same buffers again.  In particular I've encountered this
      situation in some SynMark test-cases which perform lots of BLORP
      operations with the same buffer bound as both source and destination
      (in order to generate mipmaps): In such a scenario if the source
      requires flushing we'd also end up flushing for the destination
      redundantly, even though a single PIPE_CONTROL would have been
      sufficient.
      
      This avoids a 4.5% FPS regression in SynMark OglHdrBloom and a 3.5%
      FPS regression in SynMark OglMultithread.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
      8252bb0e
    • Francisco Jerez's avatar
      iris: Remove batch argument of iris_resource_prepare_access() and friends. · 8e8198f3
      Francisco Jerez authored
      
      
      The resolves performed by this function are only expected to work from
      the render batch, so make sure we use it independently of the batch
      the caller wants to use.  This function provides no synchronization
      guarantees anyway, the caller is expected to insert any cache flushing
      and synchronization required for the resolved surface to be visible to
      the target batch.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
      8e8198f3
    • Francisco Jerez's avatar
      iris: Bracket batch operations which access memory within sync regions. · e81c07de
      Francisco Jerez authored
      
      
      This delimits all batch operations which access memory between
      iris_batch_sync_region_start() and iris_batch_sync_region_end() calls.
      This makes sure that any buffer objects accessed within the region are
      considered in use through the same caching domain until the end of the
      region.
      
      Adding any buffer to the batch validation list outside of a sync
      region will lead to an assertion failure in a future commit, unless
      the caller explicitly opted out of the cache tracking mechanism.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Part-of: <!3875>
      e81c07de
  5. 30 Apr, 2020 1 commit
  6. 29 Apr, 2020 1 commit
  7. 12 Mar, 2020 2 commits
  8. 05 Mar, 2020 3 commits
  9. 24 Feb, 2020 1 commit
  10. 22 Feb, 2020 1 commit
  11. 04 Jan, 2020 2 commits
  12. 14 Nov, 2019 1 commit
  13. 12 Nov, 2019 1 commit
  14. 29 Oct, 2019 2 commits
  15. 28 Oct, 2019 3 commits
  16. 08 Oct, 2019 1 commit
  17. 01 Sep, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Lessen texture cache hack flush for blits/copies on Icelake. · 87fa8d9e
      Kenneth Graunke authored
      Lionel found actual documentation for this at long last.  Apparently
      it actually is a sampler cache limitation that was mostly fixed on
      Icelake.  Unfortunately, it seems there are still issues with ASTC
      and non-ASTC sampler views.  Still, we can lessen the flush condition
      from "format mismatch" to "ASTC mismatch", which eliminates most of
      the flushing here.
      
      We also update the documentation to refer to the workaround name.
      87fa8d9e
  18. 13 Aug, 2019 1 commit
  19. 01 Jul, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls. · 9b1b9714
      Kenneth Graunke authored
      If our resource_copy_region size is a small number of DWords, then
      instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
      a CS stall).  We also try and select the optimal batch.
      
      Improves performance in Shadow of Mordor on Low settings at 1920x1080
      on Skylake GT4e by 0.689096% +/- 0.473968% (n=4).  It tries to copy
      4 bytes of data to a buffer which was most recently used as a writable
      compute shader SSBO.  Previously we were switching from compute to the
      render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.
      
      I arbitrarily decided to support 4/8/12/16 bytes.  Jason thinks this
      is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
      9b1b9714
  20. 20 Jun, 2019 3 commits
    • Kenneth Graunke's avatar
      iris: Drop RT flushes from depth stencil clearing flushes. · ecc50039
      Kenneth Graunke authored
      These write depth and stencil, not color writes, so there's no need
      to flush the render target.
      ecc50039
    • Kenneth Graunke's avatar
      iris: Avoid double flushing in iris_transfer_flush_region when copying. · 6890340c
      Kenneth Graunke authored
      My intention was to have iris_copy_region not do flushing, and leave
      that up to the callers.  iris_resource_copy_region needs to do this,
      but iris_transfer_flush_region was already doing it.  The net result
      was that we were doing it twice for transfers.
      
      So, move the flushing from iris_copy_region to iris_resource_copy_region
      so that it only happens in the callers as I intended.
      6890340c
    • Kenneth Graunke's avatar
      iris: Implement INTEL_DEBUG=pc for pipe control logging. · d4a4384b
      Kenneth Graunke authored
      This prints a log of every PIPE_CONTROL flush we emit, noting which bits
      were set, and also the reason for the flush.  That way we can see which
      are caused by hardware workarounds, render-to-texture, buffer updates,
      and so on.  It should make it easier to determine whether we're doing
      too many flushes and why.
      d4a4384b
  21. 17 Jun, 2019 3 commits
    • Kenneth Graunke's avatar
      iris: Make resource_copy_region handle packed depth-stencil resources. · 659d4f61
      Kenneth Graunke authored
      Also copy along the separate stencil buffer if needed.
      
      Fixes Piglit's arb_copy_image-formats.
      659d4f61
    • Kenneth Graunke's avatar
      iris: Order CS stall and TC invalidate for format reinterpretation hacks · a36f1542
      Kenneth Graunke authored
      This should ensure the TC invalidate happens after the stall.
      
      Fixes KHR-GL43.copy_image.functional which does a CopyImage (blorp_copy)
      from a buffer (using R8G8B8A8_UINT), then GetTexImage to read back the
      original image (using R10G10B10A2_UNORM).
      a36f1542
    • Kenneth Graunke's avatar
      iris: Be more aggressive at post-format-reintepret TC invalidate hack · 94b9f50e
      Kenneth Graunke authored
      When copying/blitting with format reinterpretation, we invalidate the
      texture cache before/after.  Before is so the source of the copy works,
      and after is to get rid of our new data in the "wrong" format to protect
      future attempts to sample.
      
      When I ported these hacks to iris, I tried to be cautious by only
      bothering with the hacks if the batch referenced the BO.  This makes
      some sense for the before case.  If it isn't referenced, the texture
      cache can't really have any data for the BO (since it's also invalidated
      between batches).  But we still need to do the after case regardless,
      as we've just polluted the cache with hazardous entries.
      94b9f50e
  22. 07 May, 2019 1 commit
  23. 23 Apr, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Track valid data range and infer unsynchronized mappings. · 77449d7c
      Kenneth Graunke authored
      Applications frequently call glBufferSubData() to consecutive regions
      of a VBO to append new vertex data.  If no data exists there yet, we
      can promote these to unsynchronized writes, even if the buffer is busy,
      since the GPU can't be doing anything useful with undefined content.
      This can avoid a bunch of unnecessary blitting on the GPU.
      
      u_threaded_context would do this for us, and in fact prohibits us from
      doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED).  But we haven't
      hooked that up yet, and it may be useful to disable u_threaded_context
      when debugging...at which point we'd still want this optimization.  At
      the very least, it would let us measure the benefit of threading
      independently from this optimization.  And it's not a lot of code.
      
      Removes most stall avoidance blits in "Total War: WARHAMMER."
      
      On my Skylake GT4e at 1920x1080, this appears to improve performance
      in games by the following (but I did not do many runs for proper
      statistics gathering):
      
         ----------------------------------------------
         | DiRT Rally        | +2% (avg) | + 2% (max) |
         | Bioshock Infinite | +3% (avg) | + 9% (max) |
         | Shadow of Mordor  | +7% (avg) | +20% (max) |
         ----------------------------------------------
      77449d7c
  24. 16 Apr, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Add texture cache flushing hacks for blit and resource_copy_region · c4478889
      Kenneth Graunke authored
      This is a port of Jason's 8379bff6
      from i965 to iris.  We can't find anything relevant in the documentation
      and no one we've talked to has been able to help us pin down a solution.
      
      Unfortunately, we have to put the hack in both iris_blit() and
      iris_copy_region().  st/mesa's CopyImage() implementation sometimes
      chooses to use pipe->blit() instead of pipe->resource_copy_region().
      For blits, we only do the hack if the blit source format doesn't match
      the underlying resource (i.e. it's reinterpreting the bits).  Hopefully
      this should not be too common.
      c4478889
  25. 15 Apr, 2019 1 commit
  26. 30 Mar, 2019 1 commit