v3d: optimize unifa/ldunifa sequences
This allows us to:
- Remove unused trailing ldunifa in a unifa/ldunifa sequence.
- Remove unused leading ldunifa in a unifa/ldunifa by updating the unifa address.
- Skip a unifa write for a follow-up UBO load if it reads right after the last ldunifa.
This gives another big performance boost to the UE4 Shooter demo, particularly if it is paired with disabling robust buffer access.