[r600, RV710] regression of some deqp-gles3@functional@fbo@msaa@... and deqp-gles3@functional@texture@vertex@cube@... tests on 23.1.0 vs. 23.0.0
I ran the deqp-gles3 conformance testsuite with R600_DUMP_SHADERS=1 R600_NIR_DEBUG=instr on my Radeon HD4550 and found some 23.1.0 regressions over 23.0.0:
deqp-gles3@functional@fbo@msaa@8_samples@depth32f_stencil8
deqp-gles3@functional@fbo@msaa@8_samples@r8
deqp-gles3@functional@fbo@msaa@8_samples@rg16f
deqp-gles3@functional@fbo@msaa@8_samples@rgba4
deqp-gles3@functional@texture@vertex@cube@filtering@linear_mipmap_linear_linear_repeat
deqp-gles3@functional@texture@vertex@cube@filtering@linear_mipmap_linear_nearest_repeat
deqp-gles3@functional@texture@vertex@cube@wrap@repeat_clamp
deqp-gles3@functional@texture@vertex@cube@wrap@repeat_mirror
deqp-gles3@functional@texture@vertex@cube@wrap@repeat_repeat
For the first 4 tests I noticed a different emit in the END part. Failing 23.1.0 tests are with:
[...]
FROM:vec4 32 ssa_11 = (float32)txf ssa_9 (backend1), ssa_10 (backend2), 0 (texture)
emit 'vec4 32 ssa_11 = (float32)txf ssa_9 (backend1), ssa_10 (backend2), 0 (texture)' (emit_lowered_tex)
TEX LD S10.xyzw : S8.xy_w RID:18 SID:0 NNNN
FROM:intrinsic store_output (ssa_11, ssa_2) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=FRAG_RESULT_DATA0 slots=1 /*132*/, xfb() /*0*/, xfb2() /*0*/)
EXPORT PIXEL 0 S10.xyzw
whereas this is in the 23.0.0 output where the tests pass:
[...]
FROM:vec4 32 ssa_11 = (float32)txf ssa_9 (backend1), ssa_10 (backend2), 0 (texture), 0 (sampler)
emit 'vec4 32 ssa_11 = (float32)txf ssa_9 (backend1), ssa_10 (backend2), 0 (texture), 0 (sampler)' (emit_lowered_tex)
TEX LD S10.xyzw : S8.xy_w RID:18 SID:0 NNNN
FROM:intrinsic store_output (ssa_11, ssa_2) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=4 slots=1 /*132*/, xfb() /*0*/, xfb2() /*0*/)
EXPORT PIXEL 0 S10.xyzw
Failing deqp-gles3@functional@texture@vertex@cube@... tests also show specific differences. 23.1.0:
-- END --------------------------------------------------------
FROM:vec2 32 ssa_0 = intrinsic load_barycentric_pixel () (interp_mode=0)
FROM:vec1 32 ssa_1 = load_const (0x00000000 = 0.000000)
ALU MOV S1.x@free{s} : I[0] {W}
FROM:vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=VARYING_SLOT_VAR0 slots=1 mediump /*8388768*/)
FROM:intrinsic store_output (ssa_2, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=FRAG_RESULT_DATA0 slots=1 mediump /*8388740*/, xfb() /*0*/, xfb2() /*0*/)
EXPORT PIXEL 0 S0.xyzw
--------------------------------------------------------------
vs. 23.0.0:
-- END --------------------------------------------------------
FROM:vec2 32 ssa_0 = intrinsic load_barycentric_pixel () (interp_mode=0)
FROM:vec1 32 ssa_1 = load_const (0x00000000 = 0.000000)
ALU MOV S1.x@free{s} : I[0] {W}
FROM:vec4 32 ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=32 slots=1 mediump /*8388768*/)
FROM:intrinsic store_output (ssa_2, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=4 slots=1 mediump /*8388740*/, xfb() /*0*/, xfb2() /*0*/)
EXPORT PIXEL 0 S0.xyzw
--------------------------------------------------------------
Apart from that 23.1.0 is a big improvement for my Radeon HD4550 as it passes deqp-gles3 tests without any crashes!
Only exemplary output attached; full archive would be 95 MiB. 23.1.0_deqp-gles3_functional_texture_vertex_cube_filtering_linear_mipmap_linear_linear_repeat.html
23.0.0_deqp-gles3_functional_texture_vertex_cube_filtering_linear_mipmap_linear_linear_repeat.html
23.0.0_deqp-gles3_functional_fbo_msaa_8_samples_depth32f_stencil8.html
23.1.0_deqp-gles3_functional_fbo_msaa_8_samples_depth32f_stencil8.html