turnip: consider loads/stores/clears/resolves in autotune
The first 3 commits are minor cleanups/improvements to autotune. The 4th commit calculates the costs of a render pass. The last commit changes the autotune decision to be based on the costs of sysmem rendering and gmem rendering.
In pseudo code, this MR does
drawcall_cost = drawcall_cost_per_sample * sample_count;
symem_cost = sysmem_render_pass_cost_per_pixel * render_area_pixel_count + drawcall_cost;
gmem_cost = gmem_render_pass_cost_per_pixel * render_area_pixel_count * 1.1 + drawcall_cost * 0.1;
and pick sysmem when sysmem_cost <= gmem_cost
. The 1.1
and 0.1
in the formula are quite random. They respectively mean "add 10% overhead that could come from state changes between tiles" and "assume gmem is 10x faster than system ram", and they are random guesses. Still, I think this MR makes the autotune heuristics easier to reason about and to tune, comparing to the magical 500
and 6000.0
that we used to have.
This improves gfxbench's vulkan_5_normal_off
by 3%. I am happy to look at more loads if I know how to get them (the original !12128 (merged) mentioned D3D11 traces).
(TBH, I was not looking into performance but CTS failures on ANGLE at first. MSAA resolves using event BLIT
and CP_BLIT
can have different results (the diffence is 1.0/255). The differences fail dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_sample_coverage
when autotune switches between sysmem and gmem rendering mid-way.)