v3d: try to fill nops after a thrsw
Until now, if had any delay slots after injecting a thrsw into the stream we would emit nops to ensure that whatever we scheduled next comes after the thrsw executes, however, sometimes we can do a bit better and pick up instructions scheduled after the thrsw which are indepdendent of the thrsw and execute them in the delay slots to avoid these NOPs.