WIP: NIR: intel/compiler: A code-generator generator
There is a lot here that is still WIP. Some things need to be finished or have unit tests added (e.g., the first real patch in the series). Others need some agreement on the direction to take. That said, the series is definitely ready for review. Ideally, I'd like to be able to land this series before the end of the year... which is approaching rapidly.
Some notes about the series...
The first four patches are from MR !1359 (merged).
WIP: nir/search: Slightly relax restriction that all sources must be SSA
nir/search: Allow non-SSA destination
nir/search: Conditionally allow destination saturate on root of expression tree
nir/search: Track which entry in nir_alu_instr::src each variable was
nir/search: Add public function to just compare an instr w/a search expr
nir/algebraic: Don't cross validate sizes when replace isn't a Value
nir/algebraic: Type of the transformation is a parameter to AlgebraicPass ctor
The first seven patches here make some changes to the nir_search
infrastructure to enable its use for the code-generator generator. The most significant changes are the first two. These enable limited use of nir_search
on non-SSA sources and destinations.
intel/compiler: Add a code builder that consumes a bytecode
intel/compiler: Add code generator generator
The next two patches implement the code-generator generator. The first adds simple bytecode interpreter that generates Gen assembly instructions from a compact binary description of the instructions. The second builds on nir_algebraic
to process transformation from trees of NIR to sequences of assembly instructions. The generated code-generator does not use the TreeAutomata
because the automata is built on the assumption that everything will be SSA. I did not see a way to extend it. Since the code-generator only executes once per shader, I don't think the performance benefit of the automata for matching would be significant.
intel/compiler: Import Gen8 / Gen9 ALU machine description
intel/compiler: Begin using code generator generator for Gen8 and Gen9
WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place
intel/compiler: Import and use Gen12 ALU machine description
intel/compiler: Import and use Gen11 ALU machine description
intel/compiler: Import and use Gen7 ALU machine description
intel/compiler: Remove support for all Gen7+ ALU operations and optimizations
intel/compiler: Import and use Gen6 ALU machine description
intel/compiler: Remove support for all Gen6+ ALU operations and optimizations
intel/compiler: Import and use Gen4 / Gen5 ALU machine description
intel/compiler: Remove support for all Gen4+ ALU operations and optimizations
The next 11 patches gradually add support for code generation for Gen4 through Gen12. After a sequence of generations is fully supported in the generated code-generator, all support for those generations is removed from brw_fs_nir.cpp
. At each transition there were no changes in shader-db. The only CI changes were for Gen7, and these are noted in the commit message.
The machine description file for each generation is free standing. It is very easy to diff --side-by-side -W200 src/intel/compiler/gen7_md.py src/intel/compiler/gen8_md.py | less
to see the differences from Gen7 to Gen8. This was very useful while developing the series. I think it will also be helpful when new generations are added. This has the disadvantage that optimizations and bug fixes may have to be added to multiple md
files.
The third patch, WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place
shows how it might look if common parts were factored out. As generations get farther apart, a smaller percentage of the whole is common. Having some things extracted to a common place that aren't used by all generations (e.g., if 64-bit float support was extracted) makes it more difficult to diff the md
files to see the difference between generations.
I'm slightly leaning towards having the generations be free standing (i.e., dropping WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place
).
intel/compiler: Remove dead code from fs_visitor::prepare_alu_destination_and_sources
intel/compiler: Final clean up of fs_visitor::nir_emit_alu
The next two patches just clean up some dangling bits in brw_fs_nir.cpp
.
WIP: intel/gen8/compiler: Optimize arithmetic with type conversions
intel/compiler: Only GE and L modifiers are commutative for SEL
WIP: intel/gen8/compiler: Emit SEL for some bcsel patterns
intel/compiler: CSEL can do saturate
WIP: intel/gen8/compiler: Emit CSEL for some bcsel patterns
WIP: intel/compiler: Remove opt_peephole_csel pass
nir/search: Rearrange the type check for load_front_face and load_helper_invocation
nir/search: Match types for intrinsics that have a type set
WIP: intel/compiler: Add Boolean logic optimizations for Gen6 and Gen7
The final nine patches add four optimizations that pattern-based code generation make easy. I don't intend to land the last optimization as it actually hurts Gen6 and Gen7 due to poor scheduling. It is included for instructional purposes. The generator pattern in WIP: intel/gen8/compiler: Emit CSEL for some bcsel patterns
does not suffer from at least some of the bugs in the old optimization pass. See MR !2592 (merged) for more details.