pan/bi: Support atomic exchanges
There are two "types" of atomics in Bifrost:
-
Computational atomics (adds, mins, ..) which require a complex series of instructions using ATOM_CX as a helper
-
"Freestanding" atomics, particularly exchanges
The implementations are totally disjoint. This MR supports the latter.
Support for the former will take quite a bit more time to land as it depends on the scheduler, which hasn't cleared CI in a while. The fixes here should be immediately useful, however.
[N.b.: I am unhappy about the sequence of redundant moves. This works around issues with tied operands in our current register allocation scheme. It's not ideal and I would like to fix this the right way down the line, but it's a low priority item right now, and this suffices (at a small performance cost). We have the same issue with TEXC as it is.]