ir3: Rewrite nop insertion
Don't try to chase across blocks to find a matching destination for a given source. This can be prone to exponential blowup when there is a complicated series of if-ladders and we have to crawl through every possible path. With scalar ALU, this was causing timeouts on one test when we stopped counting scalar ALU. Rather than adding yet more band-aids, just switch to a different approach that most other backends are using where we have a scoreboard of outstanding registers and we keep track of the cycle when each register becomes "ready". This integrates nicely into the pre-existing ir3 legalize infrastructure for (ss) and (sy), although it does require duplicating the logic in ir3_delayslots() in a different form.