intel/compiler: make latency/cycles and throughput estimates consistent
Right now, cycles estimation doesn't take into account the time needed for all EU sub-units to idle, but throughput estimation does that.
Cycles estimation is not used for anything inside of mesa, but it has 2 external users:
- "Cycle Count" stat returned by vkGetPipelineExecutableStatisticsKHR
- shader-db
This change makes reported cycles estimation larger (sometimes much larger), so it means that cycles before and after this commit should not be compared.
CC @currojerez, @idr, @jekstrand