1. 08 Mar, 2018 2 commits
    • Ian Romanick's avatar
      i965/vec4: Allow CSE on subset VF constant loads · 1583f49e
      Ian Romanick authored
      v2: Rewrite the code that generates the VF mask.  Suggested by Ken.
      
      No changes on other platforms.
      
      Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
      total instructions in shared programs: 13059891 -> 13059884 (<.01%)
      instructions in affected programs: 431 -> 424 (-1.62%)
      helped: 7
      HURT: 0
      helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
      helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
      95% mean confidence interval for instructions value: -1.00 -1.00
      95% mean confidence interval for instructions %-change: -3.39% -0.71%
      Instructions are helped.
      
      total cycles in shared programs: 409260032 -> 409260018 (<.01%)
      cycles in affected programs: 4228 -> 4214 (-0.33%)
      helped: 7
      HURT: 0
      helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
      helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
      95% mean confidence interval for cycles value: -2.00 -2.00
      95% mean confidence interval for cycles %-change: -1.15% 0.07%
      
      Inconclusive result (%-change mean confidence interval includes 0).
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      1583f49e
    • Ian Romanick's avatar
      i965/vec4: Relax writemask condition in CSE · 360899d4
      Ian Romanick authored
      If the previously seen instruction generates more fields than the new
      instruction, still allow CSE to happen.  This doesn't do much, but it
      also enables a couple more shaders in the next patch.  It helped quite a
      bit in another change series that I have (at least for now) abandoned.
      
      v2: Add some extra comentary about the parameters to instructions_match.
      Suggested by Ken.
      
      No changes on Skylake, Broadwell, Iron Lake or GM45.
      
      Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
      total instructions in shared programs: 11780295 -> 11780294 (<.01%)
      instructions in affected programs: 302 -> 301 (-0.33%)
      helped: 1
      HURT: 0
      
      total cycles in shared programs: 257308315 -> 257308313 (<.01%)
      cycles in affected programs: 2074 -> 2072 (-0.10%)
      helped: 1
      HURT: 0
      
      Sandy Bridge
      total instructions in shared programs: 10506687 -> 10506686 (<.01%)
      instructions in affected programs: 335 -> 334 (-0.30%)
      helped: 1
      HURT: 0
      Signed-off-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      360899d4
  2. 13 Mar, 2017 1 commit
    • Jason Ekstrand's avatar
      i965: Move the back-end compiler to src/intel/compiler · 700bebb9
      Jason Ekstrand authored
      Mostly a dummy git mv with a couple of noticable parts:
       - With the earlier header cleanups, nothing in src/intel depends
      files from src/mesa/drivers/dri/i965/
       - Both Autoconf and Android builds are addressed. Thanks to Mauro and
      Tapani for the fixups in the latter
       - brw_util.[ch] is not really compiler specific, so it's moved to i965.
      
      v2:
       - move brw_eu_defines.h instead of brw_defines.h
       - remove no-longer applicable includes
       - add missing vulkan/ prefix in the Android build (thanks Tapani)
      
      v3:
       - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
       - rebase on top of the oa patches
      
      [Emil Velikov: commit message, various small fixes througout]
      Signed-off-by: default avatarEmil Velikov <emil.velikov@collabora.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      700bebb9
  3. 03 Jan, 2017 2 commits
    • Iago Toral's avatar
      i965/vec4: teach CSE about exec_size, group and doubles · 8f39b366
      Iago Toral authored
      v2: adapt to changes in offset()
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      8f39b366
    • Juan Suárez Romero's avatar
      i965/vec4: handle 32 and 64 bit channels in liveness analysis · 4ea3bf8e
      Juan Suárez Romero authored
      Our current data flow analysis does not take into account that channels
      on 64-bit operands are 64-bit. This is a problem when the same register
      is accessed using both 64-bit and 32-bit channels. This is very common
      in operations where we need to access 64-bit data in 32-bit chunks,
      such as the double packing and packing operations.
      
      This patch changes the analysis by checking the bits that each source
      or destination datatype needs. Actually, rather than bits, we use
      blocks of 32bits, which is the minimum channel size.
      
      Because a vgrf can contain a dvec4 (256 bits), we reserve 8
      32-bit blocks to map the channels.
      
      v2 (Curro):
        - Simplify code by making the var_from_reg helpers take an extra
          argument with the register component we want.
        - Fix a couple of cases where we had to update the code to the new
          way of representing live variables.
      
      v3:
        - Fix indent in multiline expressions (Matt)
        - Fix comment's closing tag (Matt)
        - Use DIV_ROUND_UP(inst->size_written, 16) instead of 2 * regs_written(inst)
          to avoid rounding issues. The same for regs_read(i). (Curro).
        - Add asserts in var_from_reg() to avoid exceeding the allocated
          registers (Curro).
      Reviewed-by: Francisco Jerez's avatarFrancisco Jerez <currojerez@riseup.net>
      4ea3bf8e
  4. 27 Oct, 2016 1 commit
    • Iago Toral's avatar
      i965/vec4: use byte_offset() instead of offset() · ba63db1f
      Iago Toral authored
      In a later patch we want to change the semantics of offset() to be in terms
      of SIMD width and scalar channels so it is consistent with the definition
      of the same helper in the scalar backend. However, some uses of offset()
      in the vec4 backend do not operate naturally in terms of these
      semantics. In these cases it is more natural to use the byte_offset() helper
      instead.
      Reviewed-by: Francisco Jerez's avatarFrancisco Jerez <currojerez@riseup.net>
      ba63db1f
  5. 14 Sep, 2016 2 commits
    • Francisco Jerez's avatar
      i965/vec4: Replace vec4_instruction::regs_written with ::size_written field in bytes. · 69fdf13c
      Francisco Jerez authored
      The previous regs_written field can be recovered by rewriting each
      rvalue reference of regs_written like 'x = i.regs_written' to 'x =
      DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
      like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
      
      For the same reason as in the previous patches, this doesn't attempt
      to be particularly clever about simplifying the result in the interest
      of keeping the rather lengthy patch as obvious as possible.  I'll come
      back later to clean up any ugliness introduced here.
      Reviewed-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      69fdf13c
    • Francisco Jerez's avatar
      i965/vec4: Add wrapper functions for vec4_instruction::regs_read and ::regs_written. · d28cfa35
      Francisco Jerez authored
      This is in preparation for dropping vec4_instruction::regs_read and
      ::regs_written in favor of more accurate alternatives expressed in
      byte units.  The main reason these wrappers are useful is that a
      number of optimization passes implement dataflow analysis with
      register granularity, so these helpers will come in handy once we've
      switched register offsets and sizes to the byte representation.  The
      wrapper functions will also make sure that GRF misalignment (currently
      neglected by most of the back-end) is taken into account correctly in
      the calculation of regs_read and regs_written.
      Reviewed-by: Iago Toral's avatarIago Toral Quiroga <itoral@igalia.com>
      d28cfa35
  6. 19 Aug, 2016 1 commit
    • Matt Turner's avatar
      i965/vec4: Ignore swizzle of VGRF for use by var_range_end(). · e7c376ad
      Matt Turner authored
      var_range_end(v, n) loops over the n components of variable number v and
      finds the maximum value, giving the last use of any component of v.
      Therefore it expects v to correspond to the variable associated with the
      .x channel of the VGRF.
      
      var_from_reg() however returns the variable for the first channel of the
      VGRF, post-swizzle.
      
      So, if the last register had a swizzle with y, z, or w in the swizzle
      component, we would read out of bounds. For any other register, we would
      read liveness information from the next register.
      
      The fix is to convert the src_reg to a dst_reg in order to call the
      dst_reg version of var_from_reg() that doesn't consider the swizzle.
      
      Cc: mesa-stable@lists.freedesktop.org
      Reviewed-by: Francisco Jerez's avatarFrancisco Jerez <currojerez@riseup.net>
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      e7c376ad
  7. 22 Dec, 2015 1 commit
    • Kenneth Graunke's avatar
      i965: Add tessellation control shaders. · 24be658d
      Kenneth Graunke authored
      The TCS is the first tessellation shader stage, and the most
      complicated.  It has access to each of the control points in the input
      patch, and computes a new output patch.  There is one logical invocation
      per output control point; all invocations run in parallel, and can
      communicate by reading and writing output variables.
      
      One of the main responsibilities of the TCS is to write the special
      gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
      control how much new geometry the hardware tessellation engine will
      produce.  Otherwise, it simply writes outputs that are passed along
      to the TES.
      
      We run in SIMD4x2 mode, handling two logical invocations per EU thread.
      The hardware doesn't properly manage the dispatch mask for us; it always
      initializes it to 0xFF.  We wrap the whole program in an IF..ENDIF block
      to handle an odd number of invocations, essentially falling back to
      SIMD4x1 on the last thread.
      
      v2: Update comments (requested by Jordan Justen).
      Signed-off-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: Jordan Justen's avatarJordan Justen <jordan.l.justen@intel.com>
      24be658d
  8. 13 Nov, 2015 3 commits
  9. 06 Aug, 2015 1 commit
  10. 09 Jun, 2015 1 commit
  11. 04 May, 2015 2 commits
  12. 23 Mar, 2015 4 commits
  13. 15 Mar, 2015 1 commit
  14. 10 Feb, 2015 1 commit
    • Francisco Jerez's avatar
      i965: Factor out virtual GRF allocation to a separate object. · 447879eb
      Francisco Jerez authored
      Right now virtual GRF book-keeping and allocation is performed in each
      visitor class separately (among other hundred different things),
      leading to duplicated logic in each visitor and preventing layering as
      it forces any code that manipulates i965 IR and needs to allocate
      virtual registers to depend on the specific visitor that happens to be
      used to translate from GLSL IR.
      
      v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      447879eb
  15. 15 Jan, 2015 1 commit
    • Matt Turner's avatar
      i965: Don't consider null dst instructions as matching non-null dst. · f0aec4ee
      Matt Turner authored
      When performing common subexpression elimination on instructions with
      non-null destinations we emit a MOV to copy the result to a new
      register that must have no other uses. In the case of:
      
         cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f
         ...
         cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f
      
      we put the first instruction in the AEB and decided that we could reuse
      its result when we found the second. Unfortunately, that meant that we'd
      emit a MOV from the first's destination, which is null.
      
      Don't do anything if the entry's destination is null and the
      instruction's destination is non-null.
      Tested-by: Tapani Pälli's avatarTapani Pälli <tapani.palli@intel.com>
      f0aec4ee
  16. 08 Jan, 2015 1 commit
  17. 29 Dec, 2014 1 commit
  18. 05 Dec, 2014 1 commit
    • Matt Turner's avatar
      i965/vec4: Allow CSE on uniform-vec4 expansion MOVs. · 0d3cc01b
      Matt Turner authored
      Three source instructions cannot directly source a packed vec4 (<0,4,1>
      regioning) like vec4 uniforms, so we emit a MOV that expands the vec4 to
      both halves of a register.
      
      If these uniform values are used by multiple three-source instructions,
      we'll emit multiple expansion moves, which we cannot combine in CSE
      (because CSE emits moves itself).
      
      So emit a virtual instruction that we can CSE.
      
      Sometimes we demote a uniform to to a pull constant after emitting an
      expansion move for it. In that case, recognize in opt_algebraic that if
      the .file of the new instruction is GRF then it's just a real move that
      we can copy propagate and such.
      
      total instructions in shared programs: 5822418 -> 5812335 (-0.17%)
      instructions in affected programs:     351841 -> 341758 (-2.87%)
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      0d3cc01b
  19. 30 Oct, 2014 1 commit
  20. 24 Sep, 2014 2 commits
  21. 11 Sep, 2014 1 commit
  22. 22 Aug, 2014 1 commit
  23. 19 Aug, 2014 1 commit
  24. 11 Aug, 2014 2 commits
  25. 21 Jul, 2014 2 commits
  26. 14 Jul, 2014 2 commits
  27. 07 Jul, 2014 1 commit