intel/compiler: fix cmod propagation optimisations
Knowing following:
-
CMP
writes to flag register the result of applying cmod to thesrc0 - src1
. After that it stores the same value to dst. Other instructions first store their result to dst, and then store cmod(dst) to the flag register. -
inst
is eitherCMP
orMOV
-
inst->dst
is null -
inst->src[0]
overlaps withscan_inst->dst
-
inst->src[1]
is zero -
scan_inst
wrote to a flag register
There can be three possible paths:
-
scan_inst
isCMP
:Considering that
src0
is either0x0
(false), or0xffffffff
(true), andsrc1
is0x0
:-
If
inst
's cmod isNZ
(or it's aliasNEQ
), we can always removescan_inst
:NZ
is invariant for false and true. This holds even ifsrc0
is NaN:.nz
is the only cmod, that returns true for NaN. -
.g
is invariant ifsrc0
has aUD
type -
.l
is invariant ifsrc0
has aD
type
-
-
scan_inst
and inst have the same cmod:If
scan_inst
is anything thanCMP
, it already wrote the appropriate value to the flag register. -
else:
We can change cmod of
scan_inst
to that ofinst
, and removeinst
. It is valid as long as we make sure that no instruction uses the flag register betweenscan_inst
andinst
.
Nine new cmod_propagation unit tests:
cmp_cmpnz
cmp_cmpg
plnnz_cmpnz
-
plnnz_cmpz
(*) plnnz_sel_cmpz
cmp_cmpg_D
-
cmp_cmpg_UD
(*) -
cmp_cmpl_D
(*) cmp_cmpl_UD
(*) this would fail without changes to brw_fs_cmod_propagation
.
This fixes optimisation that used to be illegal (see issue #2154 (closed))
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
Now it is optimised as such (note change of cmod in line 0):
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.nz.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
No shaderdb changes
Closes: #2154 (closed)
Signed-off-by: Yevhenii Kolesnikov yevhenii.kolesnikov@globallogic.com