Skip to content

intel: Implement dual-SIMD8 PS dispatch on Gfx12 platforms

Francisco Jerez requested to merge currojerez/mesa:intel-xe-multipoly into main

This MR implements support for multipolygon pixel shader dispatch which is supported by TGL hardware and later. On Gfx12.x hardware multipolygon PS dispatch is limited to 2 polygons per SIMD thread, and can in theory allow better ALU utilization than either plain SIMD8 or SIMD16 while rendering a large number of small polygons that can't utilize the ALUs efficiently in SIMD16 dispatch mode.

Some basic plumbing is introduced in the first few patches in this MR meant to allow the driver and compiler to negotiate the number of polygons processed per thread (with future hardware in mind that will allow processing a larger number of polygons per thread):

intel/compiler: Add max_polygons FS compilation parameter.
intel/compiler: Add multipolygon dispatch fields to brw_wm_prog_data.
intel/compiler: Add polygon count statistic to brw_compile_stats.
intel/fs: Add separate constructor of fs_visitor for fragment shaders.

These are followed by the patches below which rework the layout of the ATTR register file in order to allow representing the inputs from multiple primitives in the same SIMD thread, these are likely the more complicated patches of this series:

intel/fs: Map all GS input attributes to ATTR register number 0.
intel/fs: Map all VS input attributes to ATTR register number 0.
intel/fs: Map all TES input attributes to ATTR register number 0.
intel/fs: Assert fs_reg::nr is always zero for ATTR registers in geometry stages.
intel/fs: Consider ATTR registers with different fs_reg::nr as belonging to disjoint register spaces.
intel/fs: Provide component index explicitly to interp_reg().
intel/fs: Pass builder to per_primitive_reg().
intel/fs: Fix fs_reg::component_size() to handle two-dimensional register regions.
intel/fs: Rework layout of FS vertex setup data in ATTR file to support multi-polygon dispatch.

The following patches include a handful of optimizer fixes in order to make sure that it behaves correctly when multipolygon dispatch is enabled:

intel/compiler: Pass max_polygons to copy-prop from fs_visitor.
intel/fs: Don't copy-propagate ATTR registers in multi-polygon FS shaders when invalid.
intel/compiler: Don't change types for copies from ATTR file.
intel/fs/gfx12+: Don't set nir_divergence_single_prim_per_subgroup option for fragment shaders.
intel/fs/gfx12: Don't consider multipolygon PS to have packed dispatch.
intel/fs: No need to copy null destinations in lower_simd_width.

The next few patches implement relevant changes to the PS thread payload format:

intel/fs: Fix PS thread payload setup for depth_w_coef_reg.
intel/fs/gfx12: Implement multi-polygon format of back/front-facing flag in PS payload.
intel/fs/gfx12: Implement multi-polygon format of render target array index in PS payload.

Finally, the brw_compile_fs() entry point is updated to build a multipolygon shader when possible, and state setup in the drivers is implemented so dual-SIMD8 can be enabled.

intel: Add debug flag for enabling dual-SIMD8 fragment shader dispatch.
intel/compiler: Attempt to build dual-SIMD8 variant of fragment shaders on gfx12+ platforms.
intel/genxml: Add 3DSTATE_PS definitions needed for dual-SIMD8 dispatch on Gfx12+.
intel/gfx12: Enable SIMD8 dispatch in 3DSTATE_PS for FS multipolygon dispatch.
iris/gfx12: Hook up dual-SIMD8 fragment shader dispatch.
anv/gfx12: Hook up dual-SIMD8 fragment shader dispatch.

Note that since no major performance changes have been observed on Gfx12 this series doesn't enable dual-SIMD8 by default yet until further performance evaluation is completed, however it can be enabled manually via the INTEL_SIMD_DEBUG=fs2x8 environment variable. The main motivation for this series right now is to prepare the compiler for the additional multipolygon modes available on Gfx20+, which is likely to get a greater benefit from multipolygon dispatch than Gfx12 due to its doubled ALU vector width and larger variety of multipolygon dispatch modes.

Merge request reports