Optimize ALU sources that are right-shifts of 16 or 24
I have noticed several shaders over the years that do things like
(x >> 16) + y. In those cases, we dutifully emit the shift instruction, but clever register regioning could allow us to elide it. We already use similar tricks in our integer multiplication lowering.