turnip: improve autotune for high draw calls
gfxbench gl_driver2_off
has a lot of small draws. turnip chooses gmem becase sysmem_bandwidth
is larger than gmem_bandwidth
TU: info: autotune 68d641e3cc0adab6:6402 selecting gmem
TU: info: avg_samples=4566158, draw_bandwidth_per_sample=10.00, total_draw_call_bandwidth=45653734
TU: info: render_area=1920x1080, sysmem_bandwidth_per_pixel=6, gmem_bandwidth_per_pixel=4
TU: info: sysmem_bandwidth=58095334, gmem_bandwidth=13689213
While in reality, gmem
is much slower than sysmem
I think the heuristics should also consider the cost of binning and per-tile state changes.