i965/fs generates slow code for vector comparisons
@mattst88
Submitted by Matt Turner Assigned to Ian Romanick
Description
Created attachment 97371 t.shader_test
The fragment shader runs in scalar mode, so to do vec4 comparisons we generate multiple compares and join them together using and or ors, depending on the comparison.
INTEL_DEBUG=fs,no16 bin/shader_runner t.shader_test -auto
generates:
cmp.e.f0(8) g3`<1>`D g2.3<0,1,0>F g2.7<0,1,0>F
cmp.e.f0(8) g4`<1>`D g2.2<0,1,0>F g2.6<0,1,0>F
cmp.e.f0(8) g5`<1>`D g2.1<0,1,0>F g2.5<0,1,0>F
cmp.e.f0(8) g6`<1>`D g2<0,1,0>F g2.4<0,1,0>F
and(8) g7`<1>`D g5<8,8,1>D g6<8,8,1>D
and(8) g8`<1>`D g4<8,8,1>D g7<8,8,1>D
and(8) g9`<1>`D g3<8,8,1>D g8<8,8,1>D
and.ne.f0(8) null g9<8,8,1>D 1D
...
(+f0) sel ...
We could have just predicated all but the first cmp instruction and skipped the and instructions completely:
cmp.e.f0(8) g3`<1>`D g2.3<0,1,0>F g2.7<0,1,0>F
(+f0) cmp.e.f0(8) g4`<1>`D g2.2<0,1,0>F g2.6<0,1,0>F
(+f0) cmp.e.f0(8) g5`<1>`D g2.1<0,1,0>F g2.5<0,1,0>F
(+f0) cmp.e.f0(8) g6`<1>`D g2<0,1,0>F g2.4<0,1,0>F
...
(+f0) sel ...
I think a similar thing can be done for !=
, where the join operation is or.
Attachment 97371, "t.shader_test":
t.shader_test