WIP: NIR: intel/compiler: A code-generator generator
There is a lot here that is still WIP. Some things need to be finished or have unit tests added (e.g., the first real patch in the series). Others need some agreement on the direction to take. That said, the series is definitely ready for review. Ideally, I'd like to be able to land this series before the end of the year... which is approaching rapidly.
Some notes about the series...
The first four patches are from MR !1359 (merged).
WIP: nir/search: Slightly relax restriction that all sources must be SSA nir/search: Allow non-SSA destination nir/search: Conditionally allow destination saturate on root of expression tree nir/search: Track which entry in nir_alu_instr::src each variable was nir/search: Add public function to just compare an instr w/a search expr nir/algebraic: Don't cross validate sizes when replace isn't a Value nir/algebraic: Type of the transformation is a parameter to AlgebraicPass ctor
The first seven patches here make some changes to the
nir_search infrastructure to enable its use for the code-generator generator. The most significant changes are the first two. These enable limited use of
nir_search on non-SSA sources and destinations.
intel/compiler: Add a code builder that consumes a bytecode intel/compiler: Add code generator generator
The next two patches implement the code-generator generator. The first adds simple bytecode interpreter that generates Gen assembly instructions from a compact binary description of the instructions. The second builds on
nir_algebraic to process transformation from trees of NIR to sequences of assembly instructions. The generated code-generator does not use the
TreeAutomata because the automata is built on the assumption that everything will be SSA. I did not see a way to extend it. Since the code-generator only executes once per shader, I don't think the performance benefit of the automata for matching would be significant.
intel/compiler: Import Gen8 / Gen9 ALU machine description intel/compiler: Begin using code generator generator for Gen8 and Gen9 WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place intel/compiler: Import and use Gen12 ALU machine description intel/compiler: Import and use Gen11 ALU machine description intel/compiler: Import and use Gen7 ALU machine description intel/compiler: Remove support for all Gen7+ ALU operations and optimizations intel/compiler: Import and use Gen6 ALU machine description intel/compiler: Remove support for all Gen6+ ALU operations and optimizations intel/compiler: Import and use Gen4 / Gen5 ALU machine description intel/compiler: Remove support for all Gen4+ ALU operations and optimizations
The next 11 patches gradually add support for code generation for Gen4 through Gen12. After a sequence of generations is fully supported in the generated code-generator, all support for those generations is removed from
brw_fs_nir.cpp. At each transition there were no changes in shader-db. The only CI changes were for Gen7, and these are noted in the commit message.
The machine description file for each generation is free standing. It is very easy to
diff --side-by-side -W200 src/intel/compiler/gen7_md.py src/intel/compiler/gen8_md.py | less to see the differences from Gen7 to Gen8. This was very useful while developing the series. I think it will also be helpful when new generations are added. This has the disadvantage that optimizations and bug fixes may have to be added to multiple
The third patch,
WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place shows how it might look if common parts were factored out. As generations get farther apart, a smaller percentage of the whole is common. Having some things extracted to a common place that aren't used by all generations (e.g., if 64-bit float support was extracted) makes it more difficult to diff the
md files to see the difference between generations.
I'm slightly leaning towards having the generations be free standing (i.e., dropping
WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place).
intel/compiler: Remove dead code from fs_visitor::prepare_alu_destination_and_sources intel/compiler: Final clean up of fs_visitor::nir_emit_alu
The next two patches just clean up some dangling bits in
WIP: intel/gen8/compiler: Optimize arithmetic with type conversions intel/compiler: Only GE and L modifiers are commutative for SEL WIP: intel/gen8/compiler: Emit SEL for some bcsel patterns intel/compiler: CSEL can do saturate WIP: intel/gen8/compiler: Emit CSEL for some bcsel patterns WIP: intel/compiler: Remove opt_peephole_csel pass nir/search: Rearrange the type check for load_front_face and load_helper_invocation nir/search: Match types for intrinsics that have a type set WIP: intel/compiler: Add Boolean logic optimizations for Gen6 and Gen7
The final nine patches add four optimizations that pattern-based code generation make easy. I don't intend to land the last optimization as it actually hurts Gen6 and Gen7 due to poor scheduling. It is included for instructional purposes. The generator pattern in
WIP: intel/gen8/compiler: Emit CSEL for some bcsel patterns does not suffer from at least some of the bugs in the old optimization pass. See MR !2592 (merged) for more details.