1. 22 Jan, 2018 1 commit
  2. 11 Jan, 2018 1 commit
  3. 31 Dec, 2017 1 commit
  4. 23 Dec, 2017 1 commit
  5. 06 Dec, 2017 3 commits
  6. 01 Dec, 2017 1 commit
  7. 21 Nov, 2017 1 commit
  8. 17 Nov, 2017 2 commits
  9. 14 Nov, 2017 1 commit
  10. 07 Nov, 2017 5 commits
  11. 25 Oct, 2017 1 commit
    • Jason Ekstrand's avatar
      intel/eu: Use EXECUTE_1 for JMPI · 562b8d45
      Jason Ekstrand authored
      The PRM says "The execution size must be 1."  In 73137997, the
      execution size was set to 1 when it should have been BRW_EXECUTE_1
      (which maps to 0).  Later, in dc2d3a7f, JMPI was used for
      line AA on gen6 and earlier and we started manually stomping the
      exeution size to BRW_EXECUTE_1 in the generator.  This commit fixes the
      original bug and makes brw_JMPI just do the right thing.
      Reviewed-by: Matt Turner's avatarMatt Turner <mattst88@gmail.com>
      Fixes: 73137997
      562b8d45
  12. 20 Oct, 2017 1 commit
  13. 04 Oct, 2017 1 commit
  14. 01 Oct, 2017 1 commit
    • Matt Turner's avatar
      i965: Normalize types for FBL, FBH, etc · 3cfd6ad0
      Matt Turner authored
      Allows the instructions to be compacted. The documentation claims that
      some of these only accept UD types, even though the type doesn't change
      the operation performed. Just normalize the types to ensure we get
      instruction compaction.
      
      The only functional changes are for FBL and CBIT (always use UD types)
      and FBH (always use the same types).
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      3cfd6ad0
  15. 23 Aug, 2017 1 commit
  16. 15 May, 2017 1 commit
  17. 14 Apr, 2017 4 commits
  18. 13 Mar, 2017 2 commits
    • Jason Ekstrand's avatar
      i965: Move the back-end compiler to src/intel/compiler · 700bebb9
      Jason Ekstrand authored
      Mostly a dummy git mv with a couple of noticable parts:
       - With the earlier header cleanups, nothing in src/intel depends
      files from src/mesa/drivers/dri/i965/
       - Both Autoconf and Android builds are addressed. Thanks to Mauro and
      Tapani for the fixups in the latter
       - brw_util.[ch] is not really compiler specific, so it's moved to i965.
      
      v2:
       - move brw_eu_defines.h instead of brw_defines.h
       - remove no-longer applicable includes
       - add missing vulkan/ prefix in the Android build (thanks Tapani)
      
      v3:
       - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
       - rebase on top of the oa patches
      
      [Emil Velikov: commit message, various small fixes througout]
      Signed-off-by: default avatarEmil Velikov <emil.velikov@collabora.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      700bebb9
    • Emil Velikov's avatar
      i965: remove unused brw_program.h include · 7784b3c8
      Emil Velikov authored
      Neither of the changed files requires the brw_program.h include. Since
      we're about to move them [to src/intel/compiler] with the next commit
      there's no point in having the include.
      
      Let alone the very confusing compiler include directive
      [-I${top_srcdir}/src/mesa/drivers/dri/i965/] that one would have to use.
      Signed-off-by: default avatarEmil Velikov <emil.velikov@collabora.com>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      7784b3c8
  19. 01 Mar, 2017 1 commit
  20. 26 Jan, 2017 1 commit
  21. 23 Dec, 2016 1 commit
  22. 15 Dec, 2016 4 commits
    • Francisco Jerez's avatar
      i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode. · 23caf751
      Francisco Jerez authored
      Not used anymore.  It was just a scalar MOV.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      23caf751
    • Francisco Jerez's avatar
    • Francisco Jerez's avatar
      i965/fs: Expose arbitrary pull constant load sizes to the IR. · 9b22a0d2
      Francisco Jerez authored
      Change the FS generator to ask the dataport for enough owords worth of
      constants to fill the execution size of the instruction -- Which means
      that the visitor now needs to set the execution size correctly for
      uniform pull constant load instructions, which we were kind of
      neglecting until now.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      9b22a0d2
    • Francisco Jerez's avatar
      i965/fs: Switch to the constant cache for uniform pull constants. · ad38ba11
      Francisco Jerez authored
      This reverts to using the oword block read messages for uniform pull
      constant loads, as used to be the case until
      4c1fdae0.  There are two important differences
      though: Now the L3 cacheability bits are set up correctly for UBOs
      (since 11f5d8a5), and we target the
      constant cache instead of the data cache.  The latter used to get no
      L3 way allocation on boot on all platforms that existed at the time,
      so oword read messages wouldn't get cached on L3 regardless of the
      MOCS bits, what probably explains the apparent slowness of oword
      fetches.
      
      Constant cache loads seem to perform better than SIMD4x2 sampler loads
      in a number of cases, they alleviate some of the cache thrashing
      caused by the competition with textures for the L1/L2 sampler caches,
      and they allow fetching up to 128B worth of constants with a single
      oword fetch message.
      
      Note that IVB devices suffer from a hardware bug that leads to
      serialization of L3 read requests overlapping the same cacheline as
      result of a (on IVB buggy) mechanism of the L3 to preserve coherency.
      Since read requests for matching cachelines from any L3 client are not
      pipelined, throughput may decrease in cases where there are no
      non-overlapping requests left in the queue that can be processed
      between them.
      
      This situation should be relatively uncommon as long as we make sure
      that we don't use the 1/2 oword messages in cases where the shader
      intends to read from any other location of the same cacheline at some
      other point.  This is generally a good idea anyway on all generations
      because using the 1 and 2 oword messages is expected to waste
      bandwidth since the minimum L3 request size for the DC is exactly 4
      owords (i.e. one cacheline).  A future commit will have this effect.
      I haven't been able to find any real-world example where this would
      still result in a regression on IVB, but if someone happens to find
      one it shouldn't be too difficult to add an IVB-specific check to have
      it fall back to the sampler cache for pull constant loads.
      
      Note that on SKL+ this change has the additional benefit of reducing
      the register footprint of pull constant loads.  The following table
      summarizes the effect of the whole series on several shader-db stats:
      
           Total instructions          Total cycles
      BWR: 4571248 -> 4568342 (-0.06%) 123375740 -> 123373296 (-0.00%)
      ELK: 3989020 -> 3985402 (-0.09%)  98757068 -> 98754058 (-0.00%)
      ILK: 6383591 -> 6376787 (-0.11%) 143649910 -> 143648914 (-0.00%)
      SNB: 7528395 -> 7501446 (-0.36%) 103503796 -> 102460370 (-1.01%)
      IVB: 6949221 -> 6943317 (-0.08%)  60592262 -> 60584422 (-0.01%)
      HSW: 6409753 -> 6403702 (-0.09%)  60609070 -> 60604414 (-0.01%)
      BDW: 8043467 -> 7976364 (-0.83%)  68427730 -> 68483042 (0.08%)
      CHV: 8045019 -> 7977916 (-0.83%)  68297426 -> 68352756 (0.08%)
      SKL: 8204037 -> 7939086 (-3.23%)  66583900 -> 65624378 (-1.44%)
      
           Lost->Gained Total spills          Total fills
      BWR:  5 ->   5    1488 -> 1488 (0.00%)  1957 -> 1957 (0.00%)
      ELK:  5 ->   5    1489 -> 1489 (0.00%)  1958 -> 1958 (0.00%)
      ILK:  1 ->   4    1449 -> 1449 (0.00%)  1921 -> 1921 (0.00%)
      SNB:  0 ->   0     549 -> 549 (0.00%)     52 -> 52 (0.00%)
      IVB: 13 ->   3    1271 -> 1271 (0.00%)  1162 -> 1162 (0.00%)
      HSW: 11 ->   0    1271 -> 1271 (0.00%)  1162 -> 1162 (0.00%)
      BDW: 12 ->   0    1340 -> 1340 (0.00%)  1452 -> 1452 (0.00%)
      CHV: 12 ->   0    1340 -> 1340 (0.00%)  1452 -> 1452 (0.00%)
      SKL:  0 -> 120    1269 -> 375 (-70.45%) 1563 -> 690 (-55.85%)
      
      v3: Non-trivial rebase.
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      ad38ba11
  23. 13 Dec, 2016 1 commit
  24. 29 Oct, 2016 1 commit
  25. 06 Oct, 2016 1 commit
  26. 21 Sep, 2016 1 commit