gallivm/ssbo: optimize memory access
Replace run time loop into compile time loop in emit_load_mem
, emit_store_mem
, emit_atomic_mem
.
Also remove checking for exec_mask in emit_load_mem
- mask the offset
instead - it also prevents out-of-bounds read access caused by the invalid offset when the exec_mask is 0.
It's related to #8244
There are also similar functions like emit_vote
, emit_reduce
, emit_atomic_global
, emit_shuffle
, emit_ballot
, emit_elect
, but I never used them and I don't have any software to test it.
Vulkan Geekbench 6.2 benchmark score (real CPU clock: 4.5 GHz):
- before: 2325
- after: 3770
- after + additionally removed bounds checking in
emit_load_mem
: 4977