nir: Merge consecutive store_scratch
I have observed some shaders in shader-db that have code like:
void main ()
{
movieTaps[0] = vec3(0.2165, 0.125, 1.0);
// repeat with different constant vec3 for [1..58].
movieTaps[59] = vec3(0.8668, 0.2513, 0.907);
And NIR merrily generates the obvious, horrifying code. The worst part is that since these are vec3
, each component is written individually.
vec1 32 ssa_0 = load_const (0x00000000 = 0.000000)
vec1 32 ssa_1 = load_const (0x3f800000 = 1.000000)
...
vec1 32 ssa_10 = load_const (0x3e5db22d = 0.216500)
intrinsic store_scratch (ssa_10, ssa_0) (align_mul=256, align_offset=0, wrmask=x /*1*/)
vec1 32 ssa_11 = load_const (0x3e000000 = 0.125000)
vec1 32 ssa_12 = load_const (0x00000004 = 0.000000)
intrinsic store_scratch (ssa_11, ssa_12) (align_mul=256, align_offset=0, wrmask=x /*1*/)
vec1 32 ssa_13 = load_const (0x00000008 = 0.000000)
intrinsic store_scratch (ssa_1, ssa_13) (align_mul=256, align_offset=0, wrmask=x /*1*/)
The Intel compiler generates the obvious, horrifying code. As a side note, I noticed this because those writes get misidentified as 180 spills.
A pass already exists that merges UBO and SSBO reads and writes on consecutive locations. This pass should be extended to operate on scratch reads and writes as well.