Skip to content

turnip: consider loads/stores/clears/resolves in autotune

Chia-I Wu requested to merge olv/mesa:tu-autotune into main

The first 3 commits are minor cleanups/improvements to autotune. The 4th commit calculates the costs of a render pass. The last commit changes the autotune decision to be based on the costs of sysmem rendering and gmem rendering.

In pseudo code, this MR does

drawcall_cost = drawcall_cost_per_sample * sample_count;
symem_cost = sysmem_render_pass_cost_per_pixel * render_area_pixel_count + drawcall_cost;
gmem_cost = gmem_render_pass_cost_per_pixel * render_area_pixel_count * 1.1 + drawcall_cost * 0.1;

and pick sysmem when sysmem_cost <= gmem_cost. The 1.1 and 0.1 in the formula are quite random. They respectively mean "add 10% overhead that could come from state changes between tiles" and "assume gmem is 10x faster than system ram", and they are random guesses. Still, I think this MR makes the autotune heuristics easier to reason about and to tune, comparing to the magical 500 and 6000.0 that we used to have.

This improves gfxbench's vulkan_5_normal_off by 3%. I am happy to look at more loads if I know how to get them (the original !12128 (merged) mentioned D3D11 traces).

(TBH, I was not looking into performance but CTS failures on ANGLE at first. MSAA resolves using event BLIT and CP_BLIT can have different results (the diffence is 1.0/255). The differences fail dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_sample_coverage when autotune switches between sysmem and gmem rendering mid-way.)

Merge request reports