v3d: Avoid scheduling an instruction that stalls waiting for SFU retval
If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen.
The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority.
Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle.
Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations.
First patch adds two extra shaderdb stats, the number of sfu-stalls and the sum of instructions and stalls to know the sum of cycles affected by sfu-stalls regarding the number of instructions.