gallivm: Use first_active_invocation for uniform index/offset cases too.
It means emitting first_active_invocation's code when we don't know that invocation 0 is active, but that's better than doing the loop and extra per-invocation bounds checking and things we'd have to do otherwise.
dEQP-VK.ubo.random.16bit.scalar.92 goes from 16.5 to 13.8 seconds.