intel: Some Alan Wake shader fails validation due to bad register usage in EOT message

At least one shader in the Alan Wake fossil fails EU validation.

The specific validation error (with some context) is:

LABEL1:
halt(16)        JIP:  LABEL0          UIP:  LABEL0              { align1 1H };

LABEL0:
mov(16)         g119<1>F        g123<8,8,1>F                    { align1 1H compacted };
mov(16)         g121<1>F        g123<8,8,1>F                    { align1 1H compacted };
(+f1.0) sendc(16) null<1>UW     g119<0,1,0>UD   0x10030000
                            render MsgDesc: RT write SIMD16 Surface = 0 mlen 8 rlen 0 { align1 1H };
mov(16)         g117<1>F        0x0VF           /* [0F, 0F, 0F, 0F]VF */ { align1 1H compacted };
mov(1)          a0.1<1>UD       0x00009225UD                    { align1 WE_all 1N };
(+f1.0) sendsc(16) nullUD       g125UD          g111UD          0x04031001                a0.1<0>UD
                            render MsgDesc: RT write SIMD16 LastRT Surface = 1 mlen 2 rlen 0 { align1 1H EOT };
        ERROR: send with EOT must use g112-g127
   END B0

I bisected this to:

589b03d02f0662553012249cbf097b63e7a03d72 is the first bad commit
commit 589b03d02f0662553012249cbf097b63e7a03d72
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Mon Jun 13 02:21:49 2022 -0700

    intel/fs: Opportunistically split SEND message payloads
    
    While we've taken advantage of split-sends in select situations, there
    are many other cases (such as sampler messages, framebuffer writes, and
    URB writes) that have never received that treatment, and continued to
    use monolithic send payloads.
    
    This commit introduces a new optimization pass which detects SEND
    messages with a single payload, finds an adjacent LOAD_PAYLOAD that
    produces that payload, splits it two, and updates the SEND to use both
    of the new smaller payloads.
    
    In places where we manually used split SENDS, we rely on underlying
    knowledge of the message to determine a natural split point.  For
    example, header and data, or address and value.
    
    In this pass, we instead infer a natural split point by looking at the
    source registers.  Often times, consecutive LOAD_PAYLOAD sources may
    already be grouped together in a contiguous block, such as a texture
    coordinate.  Then, there is another bit of data, such as a LOD, that
    may come from elsewhere.  We look for the point where the source list
    switches VGRFs, and split it there.  (If there is a message header, we
    choose to split there, as it will naturally come from elsewhere.)
    
    This not only reduces the payload sizes, alleviating register pressure,
    but it means that we may be able to eliminate some payload construction
    altogether, if we have a contiguous block already and some extra data
    being tacked on to one side or the other.
    
    shader-db results for Icelake are:
    
       total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
       instructions in affected programs: 6085404 -> 5852146 (-3.83%)
       helped: 23650 / HURT: 15
       helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
       helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
       HURT stats (abs)   min: 1 max: 44 x̄: 7.20 x̃: 2
       HURT stats (rel)   min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
       95% mean confidence interval for instructions value: -10.16 -9.55
       95% mean confidence interval for instructions %-change: -3.84% -3.72%
       Instructions are helped.
    
       total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
       cycles in affected programs: 599931746 -> 593959441 (-1.00%)
       helped: 22114 / HURT: 13053
       helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
       helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
       HURT stats (abs)   min: 1 max: 94022 x̄: 526.67 x̃: 22
       HURT stats (rel)   min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
       95% mean confidence interval for cycles value: -222.87 -116.79
       95% mean confidence interval for cycles %-change: -1.44% -1.20%
       Cycles are helped.
    
       total spills in shared programs: 8387 -> 6569 (-21.68%)
       spills in affected programs: 5110 -> 3292 (-35.58%)
       helped: 359 / HURT: 3
    
       total fills in shared programs: 11833 -> 8218 (-30.55%)
       fills in affected programs: 8635 -> 5020 (-41.86%)
       helped: 358 / HURT: 3
    
       LOST:   1 SIMD16 shader, 659 SIMD32 shaders
       GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders
    
       Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)
    
    Examining these results: the few shaders where spills/fills increased
    were already spilling significantly, and were only slightly hurt.  The
    applications affected were also helped in countless other shaders, and
    other shaders stopped spilling altogether or had 50% reductions.  Many
    SIMD16 shaders were gained, and overall we gain more SIMD32, though many
    close to the register pressure line go back and forth.
    
    Reviewed-by: Francisco Jerez <currojerez@riseup.net>
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>

 src/intel/compiler/brw_fs.cpp | 94 +++++++++++++++++++++++++++++++++++++++++++
 src/intel/compiler/brw_fs.h   |  1 +
 2 files changed, 95 insertions(+)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message

Admin message

intel: Some Alan Wake shader fails validation due to bad register usage in EOT message