nir/search: Add automaton-based pre-searching
See the first commit for the motivation, and detailed numbers. I was able to get a 7.2% +/- .2% compilation time reduction on Intel skylake shader-db, for a 270kB increase in binary size. The second commit is entirely optional, but it was very useful for tracking down differences in the generated code. There shouldn't be any more, as the same shader-db runs I used for compile time testing show no difference in results with this series applied.