NIR: interpolation from the same input variable with different interpolators may get optimized away
With arb_gpu_shader5@interpolateAtOffset
lower_io_to_temporaries
doesn't handle interpolation of the same variable with different interpolators correctly:
Before the lowering pass the shader looks like this:
shader: MESA_SHADER_FRAGMENT
name: GLSL3
inputs: 1
outputs: 1
uniforms: 1
shared: 0
...
decl_var shader_in INTERP_MODE_NONE vec3 ref@1 (VARYING_SLOT_VAR0.xyz, 0, 0)
...
decl_function main (0 params)
impl main {
block block_0:
/* preds: */
...
vec2 32 ssa_3 = load_const (0xbf000000 /* -0.500000 */, 0x00000000 /* 0.000000 */)
vec1 32 ssa_12 = deref_var &ref@1 (shader_in vec3)
vec3 32 ssa_13 = intrinsic interp_deref_at_offset (ssa_12, ssa_3) ()
vec3 32 ssa_27 = intrinsic load_deref (ssa_12) (0) /* access=0 */
vec1 32 ssa_32 = fneg ssa_27.x
vec1 32 ssa_33 = fneg ssa_27.z
vec1 32 ssa_34 = fadd ssa_13.y, ssa_32
vec1 32 ssa_35 = fadd ssa_13.z, ssa_33
...
}
after lower_io_to_temporaries
it looks like this:
shader: MESA_SHADER_FRAGMENT
name: GLSL3
inputs: 1
outputs: 1
uniforms: 1
shared: 0
decl_var uniform INTERP_MODE_NONE vec4 gl_FbWposYTransform (0, 0, 0)
...
decl_var INTERP_MODE_NONE vec3 in@ref-temp@1
...
decl_var shader_in INTERP_MODE_NONE vec3 ref@3 (VARYING_SLOT_VAR0.xyz, 0, 0)
...
decl_function main (0 params)
impl main {
block block_0:
/* preds: */
...
vec2 32 ssa_3 = load_const (0xbf000000 /* -0.500000 */, 0x00000000 /* 0.000000 */)
vec1 32 ssa_12 = deref_var &in@ref-temp@1 (shader_temp vec3)
vec1 32 ssa_54 = deref_var &ref@3 (shader_in vec3)
vec3 32 ssa_55 = intrinsic interp_deref_at_offset (ssa_54, ssa_3) ()
intrinsic store_deref (ssa_12, ssa_55) (7, 0) /* wrmask=xyz */ /* access=0 */
vec3 32 ssa_56 = intrinsic load_deref (ssa_12) (0) /* access=0 */
vec3 32 ssa_27 = intrinsic load_deref (ssa_12) (0) /* access=0 */
vec1 32 ssa_32 = fneg ssa_27.x
vec1 32 ssa_33 = fneg ssa_27.z
vec1 32 ssa_34 = fadd ssa_56.y, ssa_32
vec1 32 ssa_35 = fadd ssa_56.z, ssa_33
...
Namely, the interpolated value gets stored into a local variable, and is then loaded from that variable despite the original interpolation differing from the one requested with the second load. !10887 alleviate the problem in this specific case, because originally the loads are from different variables, and are kept separate by that MR, but in other cases this might not work.
It is notable that radeonsi does lower_io_to_temporaries before
nir_lower_io_to_temporaries` for exactly that reason.