intel: Only stall after sending all memory fence messages
In Gen11+, when emitting a fence for both L3 and SLM, instead of
SEND, MOV (for stall), SEND, MOV (for stall)
produce
SEND, SEND, MOV (for stall), MOV (for stall)
This MR is on top of !3226 (merged).
Edited by Caio Oliveira