intel: Some Alan Wake shader fails validation due to bad register usage in EOT message
At least one shader in the Alan Wake fossil fails EU validation.
The specific validation error (with some context) is:
LABEL1:
halt(16) JIP: LABEL0 UIP: LABEL0 { align1 1H };
LABEL0:
mov(16) g119<1>F g123<8,8,1>F { align1 1H compacted };
mov(16) g121<1>F g123<8,8,1>F { align1 1H compacted };
(+f1.0) sendc(16) null<1>UW g119<0,1,0>UD 0x10030000
render MsgDesc: RT write SIMD16 Surface = 0 mlen 8 rlen 0 { align1 1H };
mov(16) g117<1>F 0x0VF /* [0F, 0F, 0F, 0F]VF */ { align1 1H compacted };
mov(1) a0.1<1>UD 0x00009225UD { align1 WE_all 1N };
(+f1.0) sendsc(16) nullUD g125UD g111UD 0x04031001 a0.1<0>UD
render MsgDesc: RT write SIMD16 LastRT Surface = 1 mlen 2 rlen 0 { align1 1H EOT };
ERROR: send with EOT must use g112-g127
END B0
I bisected this to:
589b03d02f0662553012249cbf097b63e7a03d72 is the first bad commit
commit 589b03d02f0662553012249cbf097b63e7a03d72
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Mon Jun 13 02:21:49 2022 -0700
intel/fs: Opportunistically split SEND message payloads
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.
This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.
In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point. For
example, header and data, or address and value.
In this pass, we instead infer a natural split point by looking at the
source registers. Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate. Then, there is another bit of data, such as a LOD, that
may come from elsewhere. We look for the point where the source list
switches VGRFs, and split it there. (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)
This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.
shader-db results for Icelake are:
total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
instructions in affected programs: 6085404 -> 5852146 (-3.83%)
helped: 23650 / HURT: 15
helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
HURT stats (abs) min: 1 max: 44 x̄: 7.20 x̃: 2
HURT stats (rel) min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
95% mean confidence interval for instructions value: -10.16 -9.55
95% mean confidence interval for instructions %-change: -3.84% -3.72%
Instructions are helped.
total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
cycles in affected programs: 599931746 -> 593959441 (-1.00%)
helped: 22114 / HURT: 13053
helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
HURT stats (abs) min: 1 max: 94022 x̄: 526.67 x̃: 22
HURT stats (rel) min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
95% mean confidence interval for cycles value: -222.87 -116.79
95% mean confidence interval for cycles %-change: -1.44% -1.20%
Cycles are helped.
total spills in shared programs: 8387 -> 6569 (-21.68%)
spills in affected programs: 5110 -> 3292 (-35.58%)
helped: 359 / HURT: 3
total fills in shared programs: 11833 -> 8218 (-30.55%)
fills in affected programs: 8635 -> 5020 (-41.86%)
helped: 358 / HURT: 3
LOST: 1 SIMD16 shader, 659 SIMD32 shaders
GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders
Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)
Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt. The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions. Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
src/intel/compiler/brw_fs.cpp | 94 +++++++++++++++++++++++++++++++++++++++++++
src/intel/compiler/brw_fs.h | 1 +
2 files changed, 95 insertions(+)