Skip to content

WIP: NIR: intel/compiler: A code-generator generator

Ian Romanick requested to merge idr/mesa:review/generate-code-generator into main

There is a lot here that is still WIP. Some things need to be finished or have unit tests added (e.g., the first real patch in the series). Others need some agreement on the direction to take. That said, the series is definitely ready for review. Ideally, I'd like to be able to land this series before the end of the year... which is approaching rapidly.

Some notes about the series...

The first four patches are from MR !1359 (merged).

    WIP: nir/search: Slightly relax restriction that all sources must be SSA
    nir/search: Allow non-SSA destination
    nir/search: Conditionally allow destination saturate on root of expression tree
    nir/search: Track which entry in nir_alu_instr::src each variable was
    nir/search: Add public function to just compare an instr w/a search expr
    nir/algebraic: Don't cross validate sizes when replace isn't a Value
    nir/algebraic: Type of the transformation is a parameter to AlgebraicPass ctor

The first seven patches here make some changes to the nir_search infrastructure to enable its use for the code-generator generator. The most significant changes are the first two. These enable limited use of nir_search on non-SSA sources and destinations.

    intel/compiler: Add a code builder that consumes a bytecode
    intel/compiler: Add code generator generator

The next two patches implement the code-generator generator. The first adds simple bytecode interpreter that generates Gen assembly instructions from a compact binary description of the instructions. The second builds on nir_algebraic to process transformation from trees of NIR to sequences of assembly instructions. The generated code-generator does not use the TreeAutomata because the automata is built on the assumption that everything will be SSA. I did not see a way to extend it. Since the code-generator only executes once per shader, I don't think the performance benefit of the automata for matching would be significant.

    intel/compiler: Import Gen8 / Gen9 ALU machine description
    intel/compiler: Begin using code generator generator for Gen8 and Gen9
    WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place
    intel/compiler: Import and use Gen12 ALU machine description
    intel/compiler: Import and use Gen11 ALU machine description
    intel/compiler: Import and use Gen7 ALU machine description
    intel/compiler: Remove support for all Gen7+ ALU operations and optimizations
    intel/compiler: Import and use Gen6 ALU machine description
    intel/compiler: Remove support for all Gen6+ ALU operations and optimizations
    intel/compiler: Import and use Gen4 / Gen5 ALU machine description
    intel/compiler: Remove support for all Gen4+ ALU operations and optimizations

The next 11 patches gradually add support for code generation for Gen4 through Gen12. After a sequence of generations is fully supported in the generated code-generator, all support for those generations is removed from brw_fs_nir.cpp. At each transition there were no changes in shader-db. The only CI changes were for Gen7, and these are noted in the commit message.

The machine description file for each generation is free standing. It is very easy to diff --side-by-side -W200 src/intel/compiler/gen7_md.py src/intel/compiler/gen8_md.py | less to see the differences from Gen7 to Gen8. This was very useful while developing the series. I think it will also be helpful when new generations are added. This has the disadvantage that optimizations and bug fixes may have to be added to multiple md files.

The third patch, WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place shows how it might look if common parts were factored out. As generations get farther apart, a smaller percentage of the whole is common. Having some things extracted to a common place that aren't used by all generations (e.g., if 64-bit float support was extracted) makes it more difficult to diff the md files to see the difference between generations.

I'm slightly leaning towards having the generations be free standing (i.e., dropping WIP: intel/compiler: Refactor a bunch of patterns from Gen8 to a common place).

    intel/compiler: Remove dead code from fs_visitor::prepare_alu_destination_and_sources
    intel/compiler: Final clean up of fs_visitor::nir_emit_alu

The next two patches just clean up some dangling bits in brw_fs_nir.cpp.

    WIP: intel/gen8/compiler: Optimize arithmetic with type conversions
    intel/compiler: Only GE and L modifiers are commutative for SEL
    WIP: intel/gen8/compiler: Emit SEL for some bcsel patterns
    intel/compiler: CSEL can do saturate
    WIP: intel/gen8/compiler: Emit CSEL for some bcsel patterns
    WIP: intel/compiler: Remove opt_peephole_csel pass
    nir/search: Rearrange the type check for load_front_face and load_helper_invocation
    nir/search: Match types for intrinsics that have a type set
    WIP: intel/compiler: Add Boolean logic optimizations for Gen6 and Gen7

The final nine patches add four optimizations that pattern-based code generation make easy. I don't intend to land the last optimization as it actually hurts Gen6 and Gen7 due to poor scheduling. It is included for instructional purposes. The generator pattern in WIP: intel/gen8/compiler: Emit CSEL for some bcsel patterns does not suffer from at least some of the bugs in the old optimization pass. See MR !2592 (merged) for more details.

Merge request reports