nir: Make algebraic process the program both ways.

The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:

result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);

->

temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);

nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.

Back in 2016 in 7be8d077 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first.  This helped his cases, but I
believe introduced this failure mode.  Retain the bottom-to-top mode
first, but if we made any changes through algebraic, then go back and
do a top-to-bottom pass so we avoid the O(n^2) behavior.

Reduces runtime (over this whole series, including automaton removal)
of dEQP-GLES2.functional.uniform_api.random.3 from 4s to 1s and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 from 13.4s
to 4.06s on cheza.  An x86_64 shader-db on freedreno had no
statistically significant performance difference (n=3) between the
redundant-movs change and this commit.

This has a surprising amount of effect on compilation, though:

total instructions in shared programs: 7821406 -> 7895951 (0.95%)
total dwords in shared programs: 10893344 -> 10937696 (0.41%)
total full in shared programs: 334605 -> 334538 (-0.02%)
total constlen in shared programs: 2329266 -> 2329172 (<.01%)
total (ss) in shared programs: 153631 -> 153584 (-0.03%)
total (sy) in shared programs: 94667 -> 94668 (<.01%)
total max_sun in shared programs: 1166142 -> 1168855 (0.23%)
38 jobs for !2000 with nir-algebraic-both-ways in 23 minutes and 33 seconds (queued for 5 seconds)
latest detached
Status Job ID Name Coverage
  Containers
passed debian #614843

00:00:26

passed test-container:arm64 #614844
aarch64

00:00:20

 
  Build
passed meson-arm64 #614847

00:05:13

passed meson-armhf #614846

00:03:49

passed meson-clang #614849

00:08:24

passed meson-clover #614852

00:10:53

passed meson-i386 #614854

00:03:00

passed meson-main #614845

00:11:26

passed meson-swr-glvnd #614848

00:04:50

passed meson-vulkan #614853

00:03:02

passed scons-llvm #614856

00:03:16

passed scons-nollvm #614855

00:03:08

passed scons-swr #614850

00:08:19

passed scons-win64 #614851

00:05:20

 
  Test
passed arm64_a306_gles2 1/4 #614877
db410c

00:05:16

passed arm64_a306_gles2 2/4 #614878
db410c

00:04:54

passed arm64_a306_gles2 3/4 #614879
db410c

00:08:48

passed arm64_a306_gles2 4/4 #614880
db410c

00:04:26

passed arm64_a630_gles2 #614866
mesa-cheza

00:04:24

passed arm64_a630_gles31 1/4 #614867
mesa-cheza

00:04:50

passed arm64_a630_gles31 2/4 #614868
mesa-cheza

00:05:49

passed arm64_a630_gles31 3/4 #614869
mesa-cheza

00:05:27

passed arm64_a630_gles31 4/4 #614870
mesa-cheza

00:05:22

passed arm64_a630_gles3 1/6 #614871
mesa-cheza

00:03:50

passed arm64_a630_gles3 2/6 #614872
mesa-cheza

00:03:51

passed arm64_a630_gles3 3/6 #614873
mesa-cheza

00:03:38

passed arm64_a630_gles3 4/6 #614874
mesa-cheza

00:03:46

passed arm64_a630_gles3 5/6 #614875
mesa-cheza

00:03:49

passed arm64_a630_gles3 6/6 #614876
mesa-cheza

00:03:50

passed test-llvmpipe-gles2 1/4 #614857

00:03:29

passed test-llvmpipe-gles2 2/4 #614858

00:03:44

passed test-llvmpipe-gles2 3/4 #614859

00:03:23

passed test-llvmpipe-gles2 4/4 #614860

00:03:39

passed test-softpipe-gles2 1/4 #614861

00:01:31

passed test-softpipe-gles2 2/4 #614862

00:01:41

passed test-softpipe-gles2 3/4 #614863

00:01:24

passed test-softpipe-gles2 4/4 #614864

00:01:32

passed test-softpipe-gles3-limited #614865

00:03:46