Skip to content

freedreno, turnip, ir3: Early preamble

Connor Abbott requested to merge cwabbott0/mesa:review/ir3-early-preamble into main

In addition to introducing the scalar ALU, in a650 a copy of the scalar ALU and some other units were added to the HLSQ, which dispatches work to the uSPTPs (shader cores), and it can now execute the preamble part of shaders "early," i.e. before work is dispatched, rather than as part of the first wave dispatched to each uSPTP. This can help hide the latency of executing the preamble. Traditionally, the HLSQ also prefetched various state via the CP_LOAD_STATE packet, but recently more and more of this functionality has been moving to the preamble, with the implicit expectation that it is executed in an early preamble:

  • Since a730 shared consts (Vulkan push constants) are setup in the preamble.
  • Since a730 descriptors are prefetched in the preamble.
  • Since a750 CP_LOAD_STATE to setup constants is now deprecated and severely limited, so most driver params come from UBOs that are pushed to the constant file in a preamble.

As more and more things are being executed in the preamble, hiding the latency becomes more important.

We can't always execute a preamble early. Early preambles cannot have "normal" (i.e. not shared) registers or predicate registers (so they cannot have control flow). If the preamble contains these, then we have to fall back to using it as a normal "late" preamble.

This MR implements early preamble, based on !22075 which implements the scalar ALU. While this doesn't actually depend on that series, without scalar ALU the cases we can use early preamble are severly limited.

Merge request reports