aco: improvements to VMEMtoScalarWriteHazard mitigation
Apparently the s_waitcnt_depctr is potentially faster than v_nop: https://reviews.llvm.org/D83872
Apparently the s_waitcnt_depctr is potentially faster than v_nop: https://reviews.llvm.org/D83872