brw: Fix vectorizer hole_size condition after signedness change
(merging reviewed regression fix from !32888 (merged))
What does this MR do and why?
brw: Fix vectorizer hole_size condition after signedness change
Marek recently changed hole_size to be signed, rather than unsigned.
A negative hole_size means that the two loads overlap - and thus are
prime candidates to be combined.
My original hole_size handling was:
if hole_size > 4 * (8 - low->num_components) then don't vectorize
For non-overlapping loads, this worked: NIR's largest vector is vec16,
and if low was already a vec16, combining it with anything would exceed
that, so it'd never be considered. That meant low would always be a
vec8 or less, so (8 - low->num_components) was a positive number.
Now that we see overlapping loads, we can see a vec16 low, vec4 high,
and also a negative hole size, giving us fun comparisons like:
-16 > 4 * (8 - 16) => -16 > -32 => true, don't vectorize
Which is absolutely the wrong thing to do, because the high load's data
is entirely included within the former load's data.
The idea here was to make sure the second load would be able to pack at
least one component into the first's V8 result. But even this isn't the
best, because...even if it's simply adjacent, doing one V16 load is more
efficient than requesting two back to back V8 loads.
So, we just simplify down to a static check: if there's an entire V8 of
hole, don't vectorize. This already won't happen because the core pass
has max_hole set to 28 bytes (7 32-bit components), but that could
change based on the needs of other drivers, so let's be defensive.
fossil-db results on Alchemist:
Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05%
Subgroup size: 8092544 -> 8092568 (+0.00%)
Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05%
Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35%
Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29%
Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80%
Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05%
Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01%
Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26%
Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03%
Fixes: c21bc65ba75 ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>