1. 01 Mar, 2018 2 commits
    • Jason Ekstrand's avatar
      intel/fs: Set up sampler message headers in the visitor on gen7+ · ff472607
      Jason Ekstrand authored
      This gives the scheduler visibility into the headers which should
      improve scheduling.  More importantly, however, it lets the scheduler
      know that the header gets written.  As-is, the scheduler thinks that a
      texture instruction only reads it's payload and is unaware that it may
      write to the first register so it may reorder it with respect to a read
      from that register.  This is causing issues in a couple of Dota 2 vertex
      shaders.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104923
      Cc: mesa-stable@lists.freedesktop.org
      Reviewed-by: Francisco Jerez's avatarFrancisco Jerez <currojerez@riseup.net>
      ff472607
    • José Casanova Crespo's avatar
      i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components · 2dd94f46
      José Casanova Crespo authored
      This helper used to load 16bit components from 32-bits read now allows
      skipping components with the new parameter first_component. The semantics
      now skip components until we reach the first_component, and then reads the
      number of components passed to the function.
      
      All previous uses of the helper are updated to use 0 as first_component.
      This will allow read 16-bit components when the first one is not aligned
      32-bit. Enabling more usages of untyped_reads with 16-bit types.
      
      v2: (Jason Ektrand)
          Change parameters order to first_component, num_components
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      2dd94f46
  2. 28 Feb, 2018 1 commit
  3. 21 Feb, 2018 1 commit
  4. 26 Jan, 2018 1 commit
  5. 25 Jan, 2018 1 commit
    • Jason Ekstrand's avatar
      i965/fs: Reset the register file to VGRF in lower_integer_multiplication · db682b8f
      Jason Ekstrand authored
      18fde36c changed the way temporary
      registers were allocated in lower_integer_multiplication so that we
      allocate regs_written(inst) space and keep the stride of the original
      destination register.  This was to ensure that any MUL which originally
      followed the CHV/BXT integer multiply regioning restrictions would
      continue to follow those restrictions even after lowering.  This works
      fine except that I forgot to reset the register file to VGRF so, even
      though they were assigned a number from alloc.allocate(), they had the
      wrong register file.  This caused some GLES 3.0 CTS tests to start
      failing on Sandy Bridge due to attempted reads from the MRF:
      
          ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64
          ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64
          ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64
          ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64
          ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64
          ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64
      
      This commit remedies this problem by, instead of copying inst->dst and
      overwriting nr, just make a new register and set the region to match
      inst->dst.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626
      Fixes: 18fde36c
      Cc: "17.3" <mesa-stable@lists.freedesktop.org>
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      db682b8f
  6. 11 Jan, 2018 2 commits
    • Jason Ekstrand's avatar
      i965: Use UD types for gl_SampleID setup · c3d802d6
      Jason Ekstrand authored
      We already had to switch all of the W types to UW to prevent issues
      with vector immediates on gen10.  We may as well use unsigned types
      everywhere.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      c3d802d6
    • Jason Ekstrand's avatar
      i965/fs: Use UW types when using V immediates · 3d2b157e
      Jason Ekstrand authored
      Gen 10 has a strange hardware bug involving V immediates with W types.
      It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
      getting the value {3, 2, 1, 0, 3, 2, 1, 0}.  In particular, the bottom
      four nibbles are repeated instead of the top four being taken.  (A mov
      of 0x00003210V yields the same result.)  This bug does not appear in any
      hardware documentation as far as we can tell and the simulator does not
      implement the bug either.
      
      Commit 6132992c was mostly a no-op
      except that it changed the type of the subgroup invocation from UW to W
      and caused us to tickle this bug with basically every compute shader
      that uses any sort of invocation ID (which is most of them).  This is
      also potentially an issue for geometry shader input pulls and SampleID
      setup.  The easy solution is just to change the few places where we use
      a vector integer immediate with a W type to use a UW type.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      Cc: mesa-stable@lists.freedesktop.org
      Fixes: 6132992c
      3d2b157e
  7. 31 Dec, 2017 1 commit
  8. 19 Dec, 2017 1 commit
  9. 12 Dec, 2017 1 commit
  10. 08 Dec, 2017 2 commits
    • Jason Ekstrand's avatar
      i965/fs: Handle !supports_pull_constants and push UBOs properly · f1ce0b90
      Jason Ekstrand authored
      In Vulkan, we don't support classic pull constants and everything the
      client asks us to push, we push.  However, for pushed UBOs, we still
      want to fall back to conventional pulls if we run out of space.
      f1ce0b90
    • Jason Ekstrand's avatar
      i965/fs: Rewrite assign_constant_locations · 3b34ed79
      Jason Ekstrand authored
      This rewires the logic for assigning uniform locations to work in terms
      of "complex alignments".  The basic idea is that, as we walk the list of
      instructions, we keep track of the alignment and continuity requirements
      of each slot and assert that the alignments all match up.  We then use
      those alignments in the compaction stage to ensure that everything gets
      placed at a properly aligned register.  The old mechanism handled
      alignments by special-casing each of the bit sizes and placing 64-bit
      values first followed by 32-bit values.
      
      The old scheme had the advantage of never leaving a hole since all the
      64-bit values could be tightly packed and so could the 32-bit values.
      However, the new scheme has no type size special cases so it handles not
      only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
      types as the need arises.
      Tested-by: José Casanova Crespo's avatarJose Maria Casanova Crespo <jmcasanova@igalia.com>
      Reviewed-by: Topi Pohjolainen's avatarTopi Pohjolainen <topi.pohjolainen@intel.com>
      3b34ed79
  11. 07 Dec, 2017 1 commit
    • Francisco Jerez's avatar
      intel/fs: Implement GRF bank conflict mitigation pass. · af2c3201
      Francisco Jerez authored
      Unnecessary GRF bank conflicts increase the issue time of ternary
      instructions (the overwhelmingly most common of which is MAD) by
      roughly 50%, leading to reduced ALU throughput.  This pass attempts to
      minimize the number of bank conflicts by rearranging the layout of the
      GRF space post-register allocation.  It's in general not possible to
      eliminate all of them without introducing extra copies, which are
      typically more expensive than the bank conflict itself.
      
      In a shader-db run on SKL this helps roughly 46k shaders:
      
         total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
         conflicts in affected programs: 816222 -> 407702 (-50.05%)
         helped: 46234
         HURT: 72
      
      The running time of shader-db itself on SKL seems to be increased by
      roughly 2.52%±1.13% with n=20 due to the additional work done by the
      compiler back-end.
      
      On earlier generations the pass is somewhat less effective in relative
      terms because the hardware incurs a bank conflict anytime the last two
      sources of the instruction are duplicate (e.g. while trying to square
      a value using MAD), which is impossible to avoid without introducing
      copies.  E.g. for a shader-db run on SNB:
      
         total conflicts in shared programs: 944636 -> 623185 (-34.03%)
         conflicts in affected programs: 853258 -> 531807 (-37.67%)
         helped: 31052
         HURT: 19
      
      And on BDW:
      
         total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
         conflicts in affected programs: 1179787 -> 748933 (-36.52%)
         helped: 47592
         HURT: 70
      
      On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
      ±0.33% with n=16.
      
      NOTE: This patch intentionally disregards some i965 coding conventions
            for the sake of reviewability.  This is addressed by the next
            squash patch which introduces an amount of (for the most part
            boring) boilerplate that might distract reviewers from the
            non-trivial algorithmic details of the pass.
      
      The following patch is squashed in:
      
      SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
      Acked-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      af2c3201
  12. 06 Dec, 2017 5 commits
  13. 07 Nov, 2017 16 commits
  14. 02 Nov, 2017 2 commits
  15. 01 Nov, 2017 2 commits
  16. 30 Oct, 2017 1 commit
    • Tapani Pälli's avatar
      i965: fix blorp stage_prog_data->param leak · 446c5726
      Tapani Pälli authored
      Patch uses mem_ctx for allocation to ensure param array gets freed
      later.
      
      ==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193
      ==6164==    at 0x4C2EB6B: malloc (vg_replace_malloc.c:299)
      ==6164==    by 0x12E31C6C: ralloc_size (ralloc.c:121)
      ==6164==    by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095)
      ==6164==    by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715)
      ==6164==    by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229)
      ==6164==    by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570)
      ==6164==    by 0x130C4B07: blorp_compile_fs (blorp.c:194)
      ==6164==    by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79)
      ==6164==    by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332)
      ==6164==    by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261)
      ==6164==    by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326)
      ==6164==    by 0x12EFF72B: brw_clear (brw_clear.c:297)
      
      Fixes: 8d90e288 ("intel/compiler: Allocate pull_param in assign_constant_locations")
      Signed-off-by: Tapani Pälli's avatarTapani Pälli <tapani.palli@intel.com>
      Reviewed-by: Lionel Landwerlin's avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: mesa-stable@lists.freedesktop.org
      446c5726