intel: Relax scheduling dependencies for ARF access.
We were treating them as scheduling barriers, but since they weren't
included in is_scheduling_barrier()
, it meant that each access would add
deps from the beginning to the end of the block, for some awful O(n^2)
performance.
Reduces runtime of
dEQP-VK.binding_model.buffer_device_address.set3.depth3.basessbo.convertchecku64.nostore.single.std140.frag
from 9.9s to 2.0s on my SKL system, and should let us take the test group
off the skip list in Chrome OS due to timeouts.
shader-db:
total instructions in shared programs: 9043717 -> 9043725 (<.01%)
instructions in affected programs: 948 -> 956 (0.84%)
total cycles in shared programs: 403834893 -> 403766806 (-0.02%)
cycles in affected programs: 103629173 -> 103561086 (-0.07%)
total spills in shared programs: 4036 -> 4037 (0.02%)
spills in affected programs: 28 -> 29 (3.57%)
total fills in shared programs: 3221 -> 3224 (0.09%)
fills in affected programs: 89 -> 92 (3.37%)
Closes: #4648 (closed)