-
Emma Anholt authored
This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR. Texture fetch can probably take as much as the rest of the cycles of the program, so it's important to hide our other cycles during it (which is hard to do after register allocation). Also, we can queue up multiple texture requests before collecting the resulting samples, so that we keep the texture unit busy more of the time. High-settings openarena performance +2.35849% +/- 0.221154% (n=7). Also about 2-3% on the multiarb demo. 8 piglit tests (ext_framebuffer_multisample accuracy depthstencil) go from failing in rendering to failing in register allocation, but hopefully I can fix that up with some better register pressure handling here. total instructions in shared programs: 87723 -> 88448 (0.83%) instructions in affected programs: 78411 -> 79136 (0.92%) total estimated cycles in shared programs: 276583 -> 246306 (-10.95%) estimated cycles in affected programs: 265691 -> 235414 (-11.40%)
f1fb85e5