nir: Make algebraic process the program both ways.

The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:

result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);

->

temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);

nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.

Back in 2016 in 7be8d077 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first.  This helped his cases, but I
believe introduced this failure mode.  Retain the bottom-to-top mode
first, but if we made any changes through algebraic, then go back and
do a top-to-bottom pass so we avoid the O(n^2) behavior.

Reduces runtime (over this whole series, including automaton removal)
of dEQP-GLES2.functional.uniform_api.random.3 from 4s to 1s and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 from 13.4s
to 4.06s on cheza.  An x86_64 shader-db on freedreno had no
statistically significant performance difference (n=3) between the
redundant-movs change and this commit.

This has a surprising amount of effect on compilation, though:

total instructions in shared programs: 7821406 -> 7895951 (0.95%)
total dwords in shared programs: 10893344 -> 10937696 (0.41%)
total full in shared programs: 334605 -> 334538 (-0.02%)
total constlen in shared programs: 2329266 -> 2329172 (<.01%)
total (ss) in shared programs: 153631 -> 153584 (-0.03%)
total (sy) in shared programs: 94667 -> 94668 (<.01%)
total max_sun in shared programs: 1166142 -> 1168855 (0.23%)
38 jobs for !2000 with nir-algebraic-both-ways in 23 minutes and 33 seconds (queued for 5 seconds)
latest detached
Status Job ID Name Coverage
  Containers
passed #614843
debian

00:00:26

passed #614844
aarch64
test-container:arm64

00:00:20

 
  Build
passed #614847
meson-arm64

00:05:13

passed #614846
meson-armhf

00:03:49

passed #614849
meson-clang

00:08:24

passed #614852
meson-clover

00:10:53

passed #614854
meson-i386

00:03:00

passed #614845
meson-main

00:11:26

passed #614848
meson-swr-glvnd

00:04:50

passed #614853
meson-vulkan

00:03:02

passed #614856
scons-llvm

00:03:16

passed #614855
scons-nollvm

00:03:08

passed #614850
scons-swr

00:08:19

passed #614851
scons-win64

00:05:20

 
  Test
passed #614877
db410c
arm64_a306_gles2 1/4

00:05:16

passed #614878
db410c
arm64_a306_gles2 2/4

00:04:54

passed #614879
db410c
arm64_a306_gles2 3/4

00:08:48

passed #614880
db410c
arm64_a306_gles2 4/4

00:04:26

passed #614866
mesa-cheza
arm64_a630_gles2

00:04:24

passed #614867
mesa-cheza
arm64_a630_gles31 1/4

00:04:50

passed #614868
mesa-cheza
arm64_a630_gles31 2/4

00:05:49

passed #614869
mesa-cheza
arm64_a630_gles31 3/4

00:05:27

passed #614870
mesa-cheza
arm64_a630_gles31 4/4

00:05:22

passed #614871
mesa-cheza
arm64_a630_gles3 1/6

00:03:50

passed #614872
mesa-cheza
arm64_a630_gles3 2/6

00:03:51

passed #614873
mesa-cheza
arm64_a630_gles3 3/6

00:03:38

passed #614874
mesa-cheza
arm64_a630_gles3 4/6

00:03:46

passed #614875
mesa-cheza
arm64_a630_gles3 5/6

00:03:49

passed #614876
mesa-cheza
arm64_a630_gles3 6/6

00:03:50

passed #614857
test-llvmpipe-gles2 1/4

00:03:29

passed #614858
test-llvmpipe-gles2 2/4

00:03:44

passed #614859
test-llvmpipe-gles2 3/4

00:03:23

passed #614860
test-llvmpipe-gles2 4/4

00:03:39

passed #614861
test-softpipe-gles2 1/4

00:01:31

passed #614862
test-softpipe-gles2 2/4

00:01:41

passed #614863
test-softpipe-gles2 3/4

00:01:24

passed #614864
test-softpipe-gles2 4/4

00:01:32

passed #614865
test-softpipe-gles3-limited

00:03:46