WIP: Auto-list compositing

Roman Gilg requested to merge romangg/xserver:autoListPresent into master

This is a work-in-progress implementation of the accelerated compositing concept via auto-list as described in @keithp's blog post.


A compositor can send a list of Windows to the XServer asking it to auto-composite them into subsequent or internally generated compositor frames until told otherwise. The goal is to support for a relevant subset of Windows like for a game window a minimal latency to screen while keeping the compositor asleep as long as there is not much compositing to do besides simple clipping.

Implementation Overview

In this WIP branch the essential plumbing is done to receive the auto-list and start as well as tear down auto-compositing at arbitrary points in time. In particular:

  • Compositors may specify any Window with its child-windows to be auto-composited. XServer respects this for any PresentPixmap of one of these child-windows. The use case for that are for example media players with additional controls around a child-window presenting the video content, such that the compositor does not need to specify the specific child-window.
  • XServer creates double-buffered Pixmaps for internal use such that auto-composited client updates can be presented while the compositor does not need to be woken up.

For tracking and compositing auto-composited Pixmaps the following strategy is proposed:

  1. The target/compositor needs to do synced flips.
  2. We distinguish between async presenting clients like uncapped games and vsynced ones like media players.
    1. In first case we just reference the latest Pixmap internally and send completed/idle Present events immediately. On execute of the next compositor flip (p-vblank - see legend in Figure 1 below, in Present normally just called vblank) we then copy the currently referenced Pixmap into it. If there is no upcoming compositor flip (yet) a surrogate is created on the fly (which could be replaced by a "real" compositor flip before it goes into the pending state).
    2. In second case we add the representing struct (p-vblank) to the corresponding compositor p-vblank or in case the compositor sleeps create a surrogate on the fly. In case the surrogate gets replaced by a real compositor p-vblank, the client vblanks are transferred over to the replacement. The auto-composited p-vblanks are only referenced in this compositor p-vblank and do not attach to the execute and flip lists. The completed/idle Present events are then sent on completion/idling of the compositor p-vblank.
  3. In case a compositor p-vblank is executed but there is no new content from one of the auto-composited clients we instead copy from the client's Window directly.
  4. When the compositor p-vblank becomes pending no more changes are allowed. Subsequent p-vblanks from compositor or client are only relevant for subsequent frames.
  5. In the optimal case we use a timer to give the compositor some time in each frame to add its own content to auto-composited clients and at the same time minimize the time-to-display latency of async clients.

Figure 1 illustrates the optimal case. Note that the current branch does not feature the timer yet, since this was broken in my tests either in my preliminary timer code or what I believe at the moment because of an issue in KWin's compositing pipeline. Therefore the final executes and the first present after each vblank fall together and the pending area is maximized. Subsequent p-vblanks in the same frame are therefore only relevant to frames later on.

190604_flow-diagram Figure 1.


Another approach I followed first was to keep all client p-vblanks around, attach them to the respective event lists and on execute try to merge them into the compositor p-vblank. The problem is that we then need to sort these lists around to first composite the client p-vblanks and then flip the compositor one. Also we might time the p-vblanks incorrectly depending on if a flip is possible or not or this changes in between.

Open Questions

Besides the question if this overall design looks sensible and there are no hidden pitfalls, there are some specific issues I need input on:

  1. We want to allow simple clipping of other windows when auto-compositing. But if the compositor is manually redirecting Windows the clipList is not providing relevant information about clipping anymore.
  2. Currently this design only supports the auto-compositing of 'presented' Windows. What's the right approach for auto-compositing other Windows? Do I need to hook into Damage events?
  3. A compositor is supposed to ignore damage events from auto-composited windows. But there is on the other side no way to tell the compositor if auto-compositing was successful on a frame-by-frame basis (only if the initial auto-list had been set correctly). Auto-compositing can fail at the moment if the Window is on a different crtc or the driver call to create surrogate Pixmaps fails. An idea would be to tag damage such that compositors can either do nothing on such a damage event when auto-compositing was successful or do a normal compositing run on the next frame including the compositing of the otherwise auto-composited window when auto-compositing failed. Does this look feasible or would the damage-tag-tracking most likely become difficult quickly?
  4. Auto-composited clients are copied into the compositor p-vblank and into the client Window on execute. We might be able to instead set the Pixmap as the new Window Pixmap to save us a copy. Is this something worth pursuing? I couldn't see a large performance impact by it since it's only done once per frame.

Further Work

This patch series is accompanied by other patches to xorgproto and XCB adding the necessary changes to the Present extension for submitting the auto-list.

Edited by Roman Gilg

Merge request reports