nir/lower_vec_to_movs: don't vectorize unsupported ops
If the instruction being coalesced would be vectorized but the target doesn't support vectorizing that op, skip coalescing. Reuse the callbacks from alu_to_scalar to describe which ops should not be vectorized.
In lima, this fixes a bug where a nir_op_flog2 ends up vectorized due to the late lower_vec_to_movs.
lima does handle nir_op_flog2
in nir_lower_alu_to_scalar
, but in case of dEQP-GLES2.functional.shaders.random.exponential.fragment.11
there is the following sequence:
vec1 32 ssa_4 = flog2 ssa_2.x
vec4 32 ssa_5 = vec4 ssa_3, ssa_4, ssa_4, ssa_4
This passes through nir_lower_alu_to_scalar
but then almost at the end of the nir optimization pass, due to the vec4
op, becomes:
r0.x = flog2 ssa_2.y
r0.yzw = flog2 ssa_2.xxx
which is not possible to implement in the mali400 pp and causes the bug.
With this patch, it becomes:
r0.x = flog2 ssa_13.y
vec1 32 ssa_4 = flog2 ssa_14.x
r0.yzw = mov ssa_4.xxx
Which is not a bad way to implement this for lima.
Fixes:
dEQP-GLES2.functional.shaders.random.trigonometric.fragment.65
dEQP-GLES2.functional.shaders.random.exponential.fragment.11
dEQP-GLES2.functional.shaders.random.exponential.fragment.12
dEQP-GLES2.functional.shaders.random.exponential.fragment.37
dEQP-GLES2.functional.shaders.random.exponential.fragment.74
dEQP-GLES2.functional.shaders.random.all_features.fragment.37