1. 14 May, 2019 2 commits
  2. 07 May, 2019 1 commit
  3. 24 Apr, 2019 2 commits
    • Kenneth Graunke's avatar
      iris: Split iris_flush_and_dirty_for_history into two helpers. · 21688a30
      Kenneth Graunke authored
      We create two new helpers, iris_flush_bits_for_history, and
      iris_dirty_for_history, then use them in the existing function.
      The first accumulates flush bits based on res->bind_history, but doesn't
      actually perform a flush.  This allows us to accumulate flush bits by
      looping over multiple resources, but ultimately emit a single flush for
      all of them.
      The latter flags dirty bits without flushing, which again allows us to
      handle multiple resources, but also is more convenient when writing from
      the CPU where we don't need a flush (as in commit 4d122360).
    • Kenneth Graunke's avatar
      iris: Prefer staging blits when destination supports CCS_E. · 864873de
      Kenneth Graunke authored
      Otherwise our textures don't get color compression.  Thanks to
      Eero Tamminen for noticing this was missing!
      Improves performance of GLB27_FillTestC24Z16 on my Apollolake
      laptop with single channel RAM by 2.3x.
      Reported-by: Eero Tamminen's avatarEero Tamminen <eero.t.tamminen@intel.com>
  4. 23 Apr, 2019 5 commits
    • Kenneth Graunke's avatar
    • Kenneth Graunke's avatar
      iris: Track valid data range and infer unsynchronized mappings. · 77449d7c
      Kenneth Graunke authored
      Applications frequently call glBufferSubData() to consecutive regions
      of a VBO to append new vertex data.  If no data exists there yet, we
      can promote these to unsynchronized writes, even if the buffer is busy,
      since the GPU can't be doing anything useful with undefined content.
      This can avoid a bunch of unnecessary blitting on the GPU.
      u_threaded_context would do this for us, and in fact prohibits us from
      doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED).  But we haven't
      hooked that up yet, and it may be useful to disable u_threaded_context
      when debugging...at which point we'd still want this optimization.  At
      the very least, it would let us measure the benefit of threading
      independently from this optimization.  And it's not a lot of code.
      Removes most stall avoidance blits in "Total War: WARHAMMER."
      On my Skylake GT4e at 1920x1080, this appears to improve performance
      in games by the following (but I did not do many runs for proper
      statistics gathering):
         | DiRT Rally        | +2% (avg) | + 2% (max) |
         | Bioshock Infinite | +3% (avg) | + 9% (max) |
         | Shadow of Mordor  | +7% (avg) | +20% (max) |
    • Kenneth Graunke's avatar
      iris: Make a resource_is_busy() helper · 768b17a7
      Kenneth Graunke authored
      This checks both "is it busy" and "do we have work queued up for it"?
    • Kenneth Graunke's avatar
      iris: Replace buffer backing storage and rebind to update addresses. · 5ad0c88d
      Kenneth Graunke authored
      This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
      as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag.  When either
      of these happen, we swap out the backing storage of the buffer for a
      new idle BO, allowing us to write to it immediately without stalling
      or queueing a blit.
      On my Skylake GT4e at 1920x1080, this improves performance in games:
         | DiRT Rally        | +25% (avg) | +17% (max) |
         | Bioshock Infinite | +22% (avg) | +11% (max) |
         | Shadow of Mordor  | +27% (avg) | +83% (max) |
    • Kenneth Graunke's avatar
      iris: Mark constants dirty on transfer unmap even if no flushes occur · 4d122360
      Kenneth Graunke authored
      I have various conditions in place to try and avoid unnecessary
      PIPE_CONTROL flushes, especially to batches which may have never
      used the buffer being mapped.  But if we do a CPU map to a bound
      constant buffer, we still need to mark push constants dirty, even
      if there's nothing happening in batches that would warrant a flush.
      Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus
      (lots of rainbow colored triangles).  Fixes lots of blinking elements
      in "Shadow of Mordor".  Fixes missing crowd rendering in "DiRT Rally".
  5. 15 Apr, 2019 2 commits
  6. 10 Apr, 2019 1 commit
  7. 05 Apr, 2019 1 commit
  8. 04 Apr, 2019 1 commit
  9. 02 Apr, 2019 2 commits
  10. 28 Mar, 2019 1 commit
    • Kenneth Graunke's avatar
      iris: Actually advertise some modifiers · de783a68
      Kenneth Graunke authored
      I neglected to fill out this driver function, causing us to advertise
      0 modifiers.  Now we advertise the various tilings and let the driver
      pick them.  I've verified that X tiling works with Weston (by hacking
      the list to skip Y tiling).
      Y+CCS doesn't work yet because it's multiplane and the Gallium dri
      state tracker isn't really prepared for that.  Leave it off for now.
  11. 20 Mar, 2019 2 commits
  12. 19 Mar, 2019 1 commit
  13. 18 Mar, 2019 1 commit
  14. 14 Mar, 2019 1 commit
  15. 13 Mar, 2019 4 commits
  16. 08 Mar, 2019 2 commits
    • Kenneth Graunke's avatar
      iris: Use copy_region and staging resources to avoid transfer stalls · 9d1334d2
      Kenneth Graunke authored
      This is similar to intel_miptree_map_blit and intel_buffer_object.c's
      temporary blits in i965.
      Improves performance of DiRT Rally by 20-25% by eliminating stalls.
      Breaks piglit's spec/arb_shader_image_load_store/host-mem-barrier,
      by using the GPU to do uploads, exposing a st/mesa issue where it
      doesn't give us memory_barrier() calls.  This is a pre-existing issue
      and will be fixed by a later patch (currently out for review).
    • Kenneth Graunke's avatar
      iris: Spruce up "are we using this engine?" checks for flushing · 335726fd
      Kenneth Graunke authored
      We were using batch->contains_draw as a proxy for "are we even using
      this engine?"  That isn't quite right, because it only counts regular
      draws.  BLORP operations may have also rendered to a resource, which
      needs to trigger flushing.  To check for this, we also see if the
      render and sometimes depth caches are non-empty.
      We can also drop the "but there might already be stale data in the
      cache even if we haven't emitted any commands yet" concern in the
      comments.  The kernel flushes caches between batches.
      This may not be great but it's at least better than what was there.
  17. 21 Feb, 2019 11 commits