1. 17 Sep, 2019 1 commit
  2. 24 Jul, 2019 1 commit
  3. 17 Jul, 2019 1 commit
  4. 29 Jun, 2019 4 commits
    • Ian Romanick's avatar
      nir/serach: Increase maximum commutative expressions from 4 to 8 · 02c6cd84
      Ian Romanick authored
      
      
      No shader-db change on any Intel platform.  No shader-db run-time
      difference on a certain 36-core / 72-thread system at 95% confidence
      (n=20).
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      02c6cd84
    • Ian Romanick's avatar
      nir/algebraic: Don't mark expression with duplicate sources as commutative · 1a43cf9a
      Ian Romanick authored
      
      
      There is no reason to mark the fmul in the expression
      
          ('fmul', ('fadd', a, b), ('fadd', a, b))
      
      as commutative.  If a source of an instruction doesn't match one of the
      ('fadd', a, b) patterns, it won't match the other either.
      
      This change is enough to make this pattern work:
      
          ('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
                                ('fadd', 1.0, ('fneg', a))),
                       ('fmul', ('flrp', a, 1.0, a), b))
      
      This pattern has 5 commutative expressions (versus a limit of 4), but
      the first fmul does not need to be commutative.
      
      No shader-db change on any Intel platform.  No shader-db run-time
      difference on a certain 36-core / 72-thread system at 95% confidence
      (n=20).
      
      There are more subpatterns that could be marked as non-commutative, but
      detecting these is more challenging.  For example, this fadd:
      
          ('fadd', ('fmul', a, b), ('fmul', a, c))
      
      The first fadd:
      
          ('fmul', ('fadd', a, b), ('fadd', a, b))
      
      And this fadd:
      
          ('flt', ('fadd', a, b), 0.0)
      
      This last case may be easier to detect.  If all sources are variables
      and they are the only instances of those variables, then the pattern can
      be marked as non-commutative.  It's probably not worth the effort now,
      but if we end up with some patterns that bump up on the limit again, it
      may be worth revisiting.
      
      v2: Update the comment about the explicit "len(self.sources)" check to
      be more clear about why it is necessary.  Requested by Connor.  Many
      Python fixes style / idom fixes suggested by Dylan.  Add missing (!!!)
      opcode check in Expression::__eq__ method.  This bug is the reason the
      expected number of commutative expressions in the bitfield_reverse
      pattern changed from 61 to 45 in the first version of this patch.
      
      v3: Use all() in Expression::__eq__ method.  Suggested by Connor.
      Revert away from using __eq__ overloads.  The "equality" implementation
      of Constant and Variable needed for commutativity pruning is weaker than
      the one needed for propagating and validating bit sizes.  Using actual
      equality caused the pruning to fail for my ('fmul', ('fadd', 1, a),
      ('fadd', 1, a)) case.  I changed the name to "equivalent" rather than
      the previous "same_as" to further differentiate it from __eq__.
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      1a43cf9a
    • Ian Romanick's avatar
      nir/algebraic: Fail build when too many commutative expressions are used · 8d6b35ff
      Ian Romanick authored
      
      
      Search patterns that are expected to have too many (e.g., the giant
      bitfield_reverse pattern) can be added to a white list.
      
      This would have saved me a few hours debugging. :(
      
      v2: Implement the expected-failure annotation as a property of the
      search-replace pattern instead of as a property of the whole list of
      patterns.  Suggested by Connor.
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      8d6b35ff
    • Ian Romanick's avatar
      nir/algebraic: Fix whitespace error · 57704b8d
      Ian Romanick authored
      
      
      Trivial
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      57704b8d
  5. 14 May, 2019 2 commits
    • Ian Romanick's avatar
      nir: Add support for 2src_commutative ops that have 3 sources · e049a9c9
      Ian Romanick authored
      
      
      v2: Instead of handling 3 sources as a special case, generalize with
      loops to N sources.  Suggested by Jason.
      
      v3: Further generalize by only checking that number of sources is >= 2.
      Suggested by Jason.
      
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      e049a9c9
    • Ian Romanick's avatar
      nir: Rename commutative to 2src_commutative · ede45bf9
      Ian Romanick authored
      
      
      The meaning of the new name is that the first two sources are
      commutative.  Since this is only currently applied to two-source
      operations, there is no change.
      
      A future change will mark ffma as 2src_commutative.
      
      It is also possible that future work will add 3src_commutative for
      opcodes like fmin3.
      
      v2: s/commutative_2src/2src_commutative/g.  I had originally considered
      this, but I discarded it because I did't want to deal with identifiers
      that (should) start with 2.  Jason suggested it in review, so we decided
      that _2src_commutative would be used in nir_opcodes.py.  Also add some
      comments documenting what 2src_commutative means.  Also suggested by
      Jason.
      
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      ede45bf9
  6. 03 May, 2019 1 commit
  7. 02 May, 2019 1 commit
    • Connor Abbott's avatar
      nir/search: Add automaton-based pre-searching · 7ce86e69
      Connor Abbott authored
      
      
      nir_opt_algebraic is currently one of the most expensive NIR passes,
      because of the many different patterns we've added over the years. Even
      though patterns are already sorted by opcode, there are still way too
      many patterns for common opcodes like bcsel and fadd, which means that
      many patterns are tried but only a few actually match. One way to fix
      this is to add a pre-pass over the code that scans it using an automaton
      constructed beforehand, similar to the automatons produced by lex and
      yacc for parsing source code. This automaton has to walk the SSA graph
      and recognize possible pattern matches.
      
      It turns out that the theory to do this is quite mature already, having
      been developed for instruction selection as well as other non-compiler
      things. I followed the presentation in the dissertation cited in the
      code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep
      the naming similar. To create the automaton, we have to perform
      something like the classical NFA to DFA subset construction used by lex,
      but it turns out that actually computing the transition table for all
      possible states would be way too expensive, with the dissertation
      reporting times of almost half an hour for an example of size similar to
      nir_opt_algebraic. Instead, we adopt one of the "filter" approaches
      explained in the dissertation, which trade much faster table generation
      and table size for a few more table lookups per instruction at runtime.
      I chose the filter which resulted the fastest table generation time,
      with medium table size. Right now, the table generation takes around .5
      seconds, despite being implemented in pure Python, which I think is good
      enough. Based on the numbers in the dissertation, the other choice might
      make table compilation time 25x slower to get 4x smaller table size, but
      I don't think that's worth it. As of now, we get the following binary
      size before and after this patch:
      
          text   data	    bss	     dec	   hex	filename
      11979455 464720	 730864	13175039	c908ff	before i965_dri.so
         text	   data	    bss	    dec	           hex	filename
      12037835 616244	 791792	13445871	cd2aef	after i965_dri.so
      
      There are a number of places where I've simplified the automaton by
      getting rid of details in the LHS patterns rather than complicate things
      to deal with them. For example, right now the automaton doesn't
      distinguish between constants with different values. This means that it
      isn't as precise as it could be, but the decrease in compile time is
      still worth it -- these are the compilation time numbers for a shader-db
      run with my (admittedly old) database on Intel skylake:
      
      Difference at 95.0% confidence
      	-42.3485 +/- 1.375
      	-7.20383% +/- 0.229926%
      	(Student's t, pooled s = 1.69843)
      
      We can always experiment with making it more precise later.
      
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      7ce86e69
  8. 16 Apr, 2019 2 commits
  9. 09 Apr, 2019 1 commit
  10. 08 Apr, 2019 1 commit
    • Jason Ekstrand's avatar
      nir/search: Search for all combinations of commutative ops · 50f3535d
      Jason Ekstrand authored and Jason Ekstrand's avatar Jason Ekstrand committed
      
      
      Consider the following search expression and NIR sequence:
      
          ('iadd', ('imul', a, b), b)
      
          ssa_2 = imul ssa_0, ssa_1
          ssa_3 = iadd ssa_2, ssa_0
      
      The current algorithm is greedy and, the moment the imul finds a match,
      it commits those variable names and returns success.  In the above
      example, it maps a -> ssa_0 and b -> ssa_1.  When we then try to match
      the iadd, it sees that ssa_0 is not b and fails to match.  The iadd
      match will attempt to flip itself and try again (which won't work) but
      it cannot ask the imul to try a flipped match.
      
      This commit instead counts the number of commutative ops in each
      expression and assigns an index to each.  It then does a loop and loops
      over the full combinatorial matrix of commutative operations.  In order
      to keep things sane, we limit it to at most 4 commutative operations (16
      combinations).  There is only one optimization in opt_algebraic that
      goes over this limit and it's the bitfieldReverse detection for some UE4
      demo.
      
      Shader-db results on Kaby Lake:
      
          total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
          instructions in affected programs: 1797123 -> 1789467 (-0.43%)
          helped: 6751
          HURT: 2264
      
          total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
          cycles in affected programs: 15931005 -> 15786914 (-0.90%)
          helped: 6024
          HURT: 3436
      
          total loops in shared programs: 4360 -> 4360 (0.00%)
          loops in affected programs: 0 -> 0
          helped: 0
          HURT: 0
      
          total spills in shared programs: 23675 -> 23666 (-0.04%)
          spills in affected programs: 235 -> 226 (-3.83%)
          helped: 5
          HURT: 1
      
          total fills in shared programs: 32040 -> 32032 (-0.02%)
          fills in affected programs: 190 -> 182 (-4.21%)
          helped: 6
          HURT: 2
      
          LOST:   18
          GAINED: 5
      
      Reviewed-by: Thomas Helland's avatarThomas Helland <thomashelland90@gmail.com>
      50f3535d
  11. 10 Jan, 2019 1 commit
    • Matt Turner's avatar
      nir: Unset metadata debug bit if no progress made · 26236531
      Matt Turner authored
      
      
      NIR metadata validation verifies that the debug bit was unset (by a call
      to nir_metadata_preserve) if a NIR optimization pass made progress on
      the shader. With the expectation that the NIR shader consists of only a
      single main function, it has been safe to call nir_metadata_preserve()
      iff progress was made.
      
      However, most optimization passes calculate progress per-function and
      then return the union of those calculations. In the case that an
      optimization pass makes progress only on a subset of the functions in
      the shader metadata validation will detect the debug bit is still set on
      any unchanged functions resulting in a failed assertion.
      
      This patch offers a quick solution (short of a larger scale refactoring
      which I do not wish to undertake as part of this series) that simply
      unsets the debug bit on unchanged functions.
      
      Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      26236531
  12. 19 Dec, 2018 1 commit
  13. 16 Dec, 2018 2 commits
    • Jason Ekstrand's avatar
      3b308147
    • Jason Ekstrand's avatar
      nir: Rename Boolean-related opcodes to include 32 in the name · 80e8dfe9
      Jason Ekstrand authored and Jason Ekstrand's avatar Jason Ekstrand committed
      
      
      This is a squash of a bunch of individual changes:
      
          nir/builder: Generate 32-bit bool opcodes transparently
      
          nir/algebraic: Remap Boolean opcodes to the 32-bit variant
      
          Use 32-bit opcodes in the NIR producers and optimizations
      
              Generated with a little hand-editing and the following sed commands:
      
              sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
              sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
              sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
              sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
              sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
              sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
      
           Use 32-bit opcodes in the NIR back-ends
      
              Generated with a little hand-editing and the following sed commands:
      
              sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
              sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
              sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
              sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
              sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
              sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
              sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
      
      Reviewed-by: Emma Anholt's avatarEric Anholt <eric@anholt.net>
      Reviewed-by: Bas Nieuwenhuizen's avatarBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      Tested-by: Bas Nieuwenhuizen's avatarBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      80e8dfe9
  14. 05 Dec, 2018 5 commits
    • Jason Ekstrand's avatar
      nir: Make boolean conversions sized just like the others · dca6cd9c
      Jason Ekstrand authored
      
      
      Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
      one if 8, 16, 32, or 64.  This leads to having a few more opcodes but
      now everything is consistent and booleans aren't a weird special case
      anymore.
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      dca6cd9c
    • Jason Ekstrand's avatar
      nir/algebraic: Add support for unsized conversion opcodes · 05af952a
      Jason Ekstrand authored
      
      
      All conversion opcodes require a destination size but this makes
      constructing certain algebraic expressions rather cumbersome.  This
      commit adds support to nir_search and nir_algebraic for writing
      conversion opcodes without a size.  These meta-opcodes match any
      conversion of that type regardless of destination size and the size gets
      inferred from the sizes of the things being matched or from other
      opcodes in the expression.
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      05af952a
    • Jason Ekstrand's avatar
      nir/algebraic: Refactor codegen a bit · 4925290a
      Jason Ekstrand authored
      
      
      Instead of using an OrderedDict, just have a (necessarily sorted) array
      of transforms and a set of opcodes.
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      4925290a
    • Jason Ekstrand's avatar
      nir/algebraic: Clean up some __str__ cruft · d6aac618
      Jason Ekstrand authored
      
      
      Both of these things are already handled in the Value base class so we
      don't need to handle them explicitly in Constant.
      
      Reviewed-by: Connor Abbott's avatarConnor Abbott <cwabbott0@gmail.com>
      d6aac618
    • Connor Abbott's avatar
      nir/algebraic: Rewrite bit-size inference · 29a1450e
      Connor Abbott authored
      
      
      Before this commit, there were two copies of the algorithm: one in C,
      that we would use to figure out what bit-size to give the replacement
      expression, and one in Python, that emulated the C one and tried to
      prove that the C algorithm would never fail to correctly assign
      bit-sizes. That seemed pretty fragile, and likely to fall over if we
      make any changes. Furthermore, the C code was really just recomputing
      more-or-less the same thing as the Python code every time. Instead, we
      can just store the results of the Python algorithm in the C
      datastructure, and consult it to compute the bitsize of each value,
      moving the "brains" entirely into Python. Since the Python algorithm no
      longer has to match C, it's also a lot easier to change it to something
      more closely approximating an actual type-inference algorithm. The
      algorithm used is based on Hindley-Milner, although deliberately
      weakened a little. It's a few more lines than the old one, judging by
      the diffstat, but I think it's easier to verify that it's correct while
      being as general as possible.
      
      We could split this up into two changes, first making the C code use the
      results of the Python code and then rewriting the Python algorithm, but
      since the old algorithm never tracked which variable each equivalence
      class, it would mean we'd have to add some non-trivial code which would
      then get thrown away. I think it's better to see the final state all at
      once, although I could also try splitting it up.
      
      v2:
      - Replace instances of "== None" and "!= None" with "is None" and
      "is not None".
      - Rename first_src to first_unsized_src
      - Only merge the destination with the first unsized source, since the
      sources have already been merged.
      - Add a comment explaining what nir_search_value::bit_size now means.
      v3:
      - Fix one last instance to use "is not" instead of !=
      - Don't try to be so clever when choosing which error message to print
      based on whether we're in the search or replace expression.
      - Fix trailing whitespace.
      
      Reviewed-by: Jason Ekstrand's avatarJason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      29a1450e
  15. 26 Oct, 2018 1 commit
  16. 23 Oct, 2018 1 commit
  17. 22 Oct, 2018 5 commits
  18. 09 Aug, 2018 2 commits
  19. 01 Aug, 2018 2 commits
    • Mathieu Bridon's avatar
      python: Explicitly add the 'L' suffix on Python 3 · ad363913
      Mathieu Bridon authored and Eric Engestrom's avatar Eric Engestrom committed
      
      
      Python 2 had two integer types: int and long. Python 3 dropped the
      latter, as it made the int type automatically support bigger numbers.
      
      As a result, Python 3 lost the 'L' suffix on integer litterals.
      
      This probably doesn't make much difference when compiling the generated
      C code, but adding it explicitly means that both Python 2 and 3 generate
      the exact same C code anyway, which makes it easier to compare and check
      for discrepencies when moving to Python 3.
      
      Signed-off-by: Mathieu Bridon's avatarMathieu Bridon <bochecha@daitauha.fr>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      ad363913
    • Mathieu Bridon's avatar
      python: Don't abuse hex() · e40200e0
      Mathieu Bridon authored and Eric Engestrom's avatar Eric Engestrom committed
      
      
      The hex() builtin returns a string containing the hexa-decimal
      representation of an integer.
      
      When the argument is not an integer, then the function calls that
      object's __hex__() method, if one is defined. That method is supposed to
      return a string.
      
      While that's not explicitly documented, that string is supposed to be a
      valid hexa-decimal representation for a number. Python 2 doesn't enforce
      this though, which is why we got away with returning things like
      'NIR_TRUE' which are not numbers.
      
      In Python 3, the hex() builtin instead calls an object's __index__()
      method, which itself must return an integer. That integer is then
      automatically converted to a string with its hexa-decimal representation
      by the rest of the hex() function.
      
      As a result, we really can't make this compatible with Python 3 as it
      is.
      
      The solution is to stop using the hex() builtin, and instead use a hex()
      object method, which can return whatever we want, in Python 2 and 3.
      
      Signed-off-by: Mathieu Bridon's avatarMathieu Bridon <bochecha@daitauha.fr>
      Reviewed-by: Eric Engestrom's avatarEric Engestrom <eric.engestrom@intel.com>
      Reviewed-by: Dylan Baker's avatarDylan Baker <dylan@pnwbakers.com>
      e40200e0
  20. 24 Jul, 2018 2 commits
  21. 05 Jul, 2018 1 commit
    • Mathieu Bridon's avatar
      python: Stabilize some script outputs · fe8a1536
      Mathieu Bridon authored and Eric Engestrom's avatar Eric Engestrom committed
      
      
      In Python, dictionaries and sets are unordered, and as a result their
      is no guarantee that running this script twice will produce the same
      output.
      
      Using ordered dicts and explicitly sorting items makes the build more
      reproducible, and will make it possible to verify that we're not
      breaking anything when we move the build scripts to Python 3.
      
      Reviewed-by: Eric Engestrom's avatarEric Engestrom <eric.engestrom@intel.com>
      fe8a1536
  22. 10 Mar, 2017 1 commit
  23. 20 Jan, 2017 1 commit