gallivm: optimize first-active-invocation reading (fixing subgroupbroadcast timeout flakes)
Try to avoid emitting a loop per lookup of the first active invocation, so that we don't spend ages inside LLVM processing all those blocks.
50 -> 30 seconds on dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i16vec4
on my system
I've got a followup change in my lp-active-invoc-uniform branch to help with UBO/SSBO accesses, but store_ssbo regresses.
Edited by Emma Anholt