tests: optimize atomicCompSwap tests
These tests create lots of threads that contend for access to a single global variable. Once we hit a certain level of parallelism, it's easy for these tests to take too much time, triggering GPU hang checks in the kernel, even though most invocations don't contribute to the overall test result.
This MR implements 21 optimizations:
- exit early when we are sure that "value" will not be recorded in the output "mask" array
don't read "value" on each atomicCompSwap loop iteration - use the value returned by atomicCompSwap (this is what some other tests already do)
On my ICL laptop, theyit reduces execution time from >1s to 0.1s.
Note: the first patch triggers a bug in mesa, causing an infinite loop in intel_shader_atomic_float_minmax.execution.ssbo-atomicCompSwap-float, which is fixed by mesa!7538 (merged).
The second patch (unintentionally) works around the bug fixed by mesa!7538 (merged).
Closes: mesa#3753 (closed)
CC: @idr