Skip to content
  • Francisco Jerez's avatar
    intel/ir/gen12+: Work around FS performance regressions due to SIMD32 discard divergence. · 797ed40a
    Francisco Jerez authored and Eric Engestrom's avatar Eric Engestrom committed
    This avoids some performance regressions on Gen12 platforms caused by
    SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
    and GFXBench5 Car Chase and Aztec Ruins.
    
    The most obvious pattern in the regressing shaders I identified among
    these workloads is that they all had non-uniform discard statements,
    which are handled rather optimistically by the current IR analysis
    pass: No penalty is currently applied to the SIMD32 variant of the
    shader in the form of differing branching weights like we do for other
    control flow instructions in order to account for the greater
    likelihood of divergence of a SIMD32 shader.
    
    Simply changing that by giving the same treatment to discard
    statements as we give to other branching instructions seemed to hurt
    more than it helped on platforms earlier than Gen12, since it reversed
    most of the improvement obtained from SIMD32 fragment shaders in
    Manhattan for no measurable benefit in other workloads (Manhattan has
    a handful of shaders with statically non-uniform discard statements
    which actually perform better in SIMD32 mode due to their approximate
    dynamic uniformity).  For that reason this change is applied to Gen12+
    platforms only.
    
    I've been running a number of tests trying to understand the
    difference in behavior between Gen12 and earlier platforms, and most
    of the evidence I've gathered seems to point at EU fusion being the
    culprit: Unlike previous generations, on Gen12 EUs are arranged in
    pairs which execute instructions in lockstep, giving an effective warp
    size of 64 threads in SIMD32 mode, which seems to increase the
    likelihood for control flow divergence in some of the affected shaders
    significantly.
    
    Fixes: 188a3659
    
     "intel/ir: Import shader performance analysis pass."
    Reported-by: default avatarCaleb Callaway <caleb.callaway@intel.com>
    Reviewed-by: default avatarMatt Turner <mattst88@gmail.com>
    Part-of: <!5910>
    (cherry picked from commit 4d73988f)
    797ed40a