winsys: Improve wayland PRIME perf and cleanups

Alexander Orzechowski requested to merge Nefsen402/mesa:winsys-wl-multigpu into main

This addresses performance issues with PRIME on the wayland winsys. Currently, when running in PRIME situations we will allocate a linear buffer on the primary device and a regular buffer (with modifiers and everything) on the PRIME device. It then blits the contents to update the buffers. We need to do this because we can't guarantee that the modifiers chosen for rendering on the prime device are importable on the primary device. Instead what this MR does is it will just allocate a linear render buffer and get rid of the display buffer completely. This works because wayland compositors can just import the buffer to whatever GPU it wishes. In fact, it gets even better: wayland compositors exist that will never import buffers across GPUs if the display connector is physically connected to the GPU currently rendering.

I have benchmarked some results here using minecraft with a patched glfw for wayland and DRI_PRIME=1 at a 3840x2160 resolution. The PRIME device used was a AMD RX 580 and the primary GPU used was a AMD RX 6900xt. The driver being used for both GPUs was radieonsi. I had monitors connected to both GPUs with a compositor that will composite using the local device for each monitor.

descripton perf
all modifiers 148fps
all modifiers imported on primary gpu N/A
linear modifiers 173fps
linear modifiers imported on primary gpu 7fps
mesa main 48fps
mesa main on primary gpu N/A

Description of the tests:

  • all modifiers: A patched mesa to force unrestricted modifiers and being composited and scanned out on the prime device. This should be more or less what would be expected from a system having the RX 580 as the primary device with no prime usage.
  • all modifiers imported on primary gpu: This scenario didn't work because my primary GPU didn't understand the modifier used so it failed to import.
  • linear modifiers: Current state of the MR. Linear modifiers are used as the render target buffer. The perf is recorded on compositing and scanout done on the prime device.
  • linear modifiers imported on primary gpu: Same as above, but compositing and scanout are done on the primary GPU.
  • mesa main: Mesa main perf of the application being composited and scanned out on the prime device.
  • mesa main on primary gpu: Mesa main perf of the prime application being composited and scanned out on the primary GPU. It looked like the buffer successfully imported, but I saw garbage (random colours everywhere) and the application contents were not discernible. Decreasing the resolution resulted in a correct presentation (application visible) but I was getting very low performance (< 10fps) with a lower resolution.

I tried something similar with xwayland, but it seems like the xwayland server doesn't support this.

Edited by Alexander Orzechowski

Merge request reports