Make CSE more global
The current CSE pass is hash-based like NIR's so it's efficient. However, it's currently has several limitations:
- It's local-only. Nothing can be CSE'd across blocks. This is especially bad for constants where we really don't want a separate instance of 1.0f in every block.
- It bails the moment it sees a non-WLR register.
Solving 1. isn't too hard. All we need to do is write a dominance pass (#42 (closed)) and do the same thing that NIR does with cloning sets.
Solving 2 is a bit more difficult. My current idea for it would be to, instead of considering registers as values, consider "register written at IP" as the values. Then, at any convergence point in the IR if the register has been written somewhere the doesn't dominate the next block, we treat the register as being written at the IP of the convergence point. The annoying part here will be that the current hash set/table APIs make it kind of difficult to plumb through this bit of extra info.