ACO: Make all booleans per-lane
This MR changes all bools in ACO to be per-lane, thus to some extent behave as if they were divergent. Thus, it simplifies a large amount of code in the instruction selection. Then some optimizations are added to get the shaders back to where they were before with regards to code size and SGPR use.
Edited by Timur Kristóf