radeonsi has this optimization but radv doesn't, which makes the 16 byte alignment assumption invalid in
This is not a problem when LLVM because
align_offset are not used. But when aco, this will generate
ds_read_b128 for un-aligned data. So make radv use this optimization to unify the nir code.