amd/llvm: Fix divergent descriptor indexing. (v2)
There are multiple LLVM passes that very much move the intrinsic using the descriptor outside of the loop, defeating the entire point of creating the loop.
Defeat the optimizer by splitting the break into a separate if-statement and putting an optimization barrier on the bool in between.
v2: Move from a callback based system to begin/end loop. This does not make it significantly less intrusive but is a bit nicer with all the extra struct and callback stubs.