intel/fs: Opportunistically split SEND message payloads (!17018) · Merge requests · Mesa / mesa

Kenneth Graunke requested to merge kwg/mesa:opt-split-sends into main Jun 13, 2022
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.

This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.

In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point.  For
example, header and data, or address and value.

In this pass, we instead infer a natural split point by looking at the
source registers.  Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate.  Then, there is another bit of data, such as a LOD, that
may come from elsewhere.  We look for the point where the source list
switches VGRFs, and split it there.  (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)

This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.

shader-db results for Tigerlake are:

   total instructions in shared programs: 20632140 -> 20542987 (-0.43%)
   instructions in affected programs: 9172527 -> 9083374 (-0.97%)
   helped: 19431 / HURT: 12958
   helped stats (abs) min: 1 max: 1184 x̄: 6.74 x̃: 3
   helped stats (rel) min: 0.03% max: 23.48% x̄: 2.21% x̃: 1.47%
   HURT stats (abs)   min: 1 max: 209 x̄: 3.22 x̃: 2
   HURT stats (rel)   min: 0.04% max: 33.33% x̄: 1.69% x̃: 1.23%
   95% mean confidence interval for instructions value: -2.91 -2.59
   95% mean confidence interval for instructions %-change: -0.68% -0.62%
   Instructions are helped.

   total cycles in shared programs: 779667768 -> 776951591 (-0.35%)
   cycles in affected programs: 601386669 -> 598670492 (-0.45%)
   helped: 21107 / HURT: 15217
   helped stats (abs) min: 1 max: 487366 x̄: 800.39 x̃: 20
   helped stats (rel) min: <.01% max: 78.66% x̄: 4.70% x̃: 0.76%
   HURT stats (abs)   min: 1 max: 97759 x̄: 931.70 x̃: 17
   HURT stats (rel)   min: <.01% max: 188.39% x̄: 4.97% x̃: 0.59%
   95% mean confidence interval for cycles value: -144.57 -4.99
   95% mean confidence interval for cycles %-change: -0.77% -0.53%
   Cycles are helped.

   total spills in shared programs: 5744 -> 4066 (-29.21%)
   spills in affected programs: 3589 -> 1911 (-46.75%)
   helped: 347 / HURT: 2

   total fills in shared programs: 6096 -> 2832 (-53.54%)
   fills in affected programs: 5795 -> 2531 (-56.32%)
   helped: 347 / HURT: 2

   LOST: 555 SIMD32 shaders
   GAINED: 66 SIMD16 shaders, 795 SIMD32 shaders

Examining these results: the two shaders where spilling increased were
already spilling quite a bit, but increased a small amount, while many
of the helped shaders stopped spilling altogether.  The hurt application
was also helped in countless other shaders.  Many SIMD16 shaders were
gained, and overall we gain more SIMD32, even though some close to the
register pressure line go back and forth.
Admin message

intel/fs: Opportunistically split SEND message payloads

Merge request reports