turnip: Fix the lack of WFM before indirect draws
We have to add WFM to pending bits when we are flushing into CP for indirect draw to know when they should apply WFM workaround.
Fixes CTS tests:
dEQP-VK.draw.renderpass.indirect_draw.*_data_from_compute.indirect_draw_count*
Fixes: abf0ae01 ("tu: Properly handle waiting on an earlier pipeline stage")