Skip to content

tu: implement sysmem vs gmem autotuner

The implementation is separate from Freedreno due to multithreading support.

In Vulkan application may fill command buffer from many threads and expect no locking to occur. We do introduce the possibility of locking on renderpass end, however assuming that application doesn't have a huge amount of slightly different renderpasses, there would be minimal to none contention.

Other assumptions are:

  • Application doesn't create one-time-submit command buffers to hold them indefinitely without submission.
  • Application does submit command buffers soon after their creation.

Breaking the above may lead to some decrease in performance or autotuner turning itself off.

The heuristic is too simplistic at the moment. We should account for load/stores/clears/resolves especially
with low drawcall count and ~fb_size samples passed, in D3D11 games we are seeing many renderpasses like:

  • color attachment load
  • single fullscreen draw
  • color attachment store

image

To make a good heuristic we would have to run a bunch of traces with and without forced sysmem and gather
statistics how sysmem vs gmem performance depends on renderpass parameters we could gather.

This would be my next step.

Edited by Danylo Piliaiev

Merge request reports