Fence optimization not triggering for timeline semaphores
in AMDGPU generally if the signal and wait of a syncobj happen on the same queue the cmdbuffer with the wait can be scheduled before the signalling cmdbuffer is finished. This can avoid causing a GPU->CPU->GPU roundtrip with an idle GPU in between.
However this seems to fail with timeline syncobj.
My running assumption is that we try to convert the fence to a drm scheduler fence:
but this will likely fail as we're dealing with a dma fence chain fence.
So if we want to keep this optimization we will need to do something like unpacking the last fence in the chain.