intel/compiler: Update block IPs once per pass
I tested both release builds and debugoptimized
builds with dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13. The change to dead_code_eliminate
was the bigger win. On a desktop Kabylake system at n=30,
release build (w/Fedora build flags): -7.79% ± 0.25%
Meson -Dbuildtype=debugoptimized: -5.10% ± 0.40%
The change in register_coalesce
helped only a tiny bit at n=30,
release build (w/Fedora build flags): -0.82% ± 0.23%
Meson -Dbuildtype=debugoptimized: -0.74% ± 0.27%
The other two commits (marked "FYI") don't help on that test. Those are just the only two other places that use remove
a lot.
The difference in the debugoptimized
build is the calls to inst_is_in_block(block, this)
still exist on each call to remove()
. I tried a couple things to reduce that overhead, but I was unsuccessful.
See also #4641 (closed).