Skip to content

Proper multigpu

Alexander Orzechowski requested to merge Nefsen402/wlroots:multi-gpu into master

Based on: !3634

Things to test once I'm at a multigpu computer again so I can remove the draft

  • Test CPU copy path
  • Test primary GPU blit path

This MR completely replaces the current multigpu logic in wlroots where all compositing has to go through a single GPU, to instead use a less centralized model where compositing will happen on the local device on the connector for each given output. This can reduce multigpu hops.

breaking not because of any API changes, but how those APIs are supposed to be used. No longer does wlroots transparently do multigpu, the compositor has to do some work now as well. A new helper called wlr_output_manager is "oughtta be good enough for anybody" implementation for proper multigpu that compositors can use to make all that stuff go away.

Here's my dusty computer to make this obvious about what we're talking about:
DSC_0049
I have two gpu connected in my computer: One is a Radeon RX 580 (the sapphire card below) and the other is a Radeon RX 6900xt (the watercooled one). I also have two monitors: One is connected via DP-1 to the RX 580 and the other DP-3 is connected to my RX 6900xt. Currently, with wlroots, if I was rendering something on the RX 580 maybe by switching DRI_PRIME flag for opengl applications, what wlroots would do is blit the contents of the applications up to the primary GPU over PCIe, in this case the 6900xt. If I was viewing that stuff from DP-3 then no more hops would need to be taken - it would be composited and scanned out like normal. This MR does this as well. However, here's where it gets interesting: If I was viewing the contents on DP-1 and we already unconditionally moved rendering up to the 6900xt, the we'll have to make a secondary hop back down the the RX 580 for scanout. This makes two PCIe hops which in my case with the RX 580 going through the chipset and only given 4x at 2.5GT/s things tend to be come a slide show. 3840x2160@60Hz cannot be reached with this model, it's clear that the desktop is compositing closer to maybe 30hz. This MR helps with this worst case scenario and compositing will stay local to the RX580 if the connector is also connected there. This MR also helps tremendously with SHM clients (cpu rendering). This MR can simply take that SHM buffer and upload it where its needed to whatever GPUs and have no multigpu hops at all. With this, I'm able to get my 60hz 4k desktop.

Draft because for whatever reasons, the dmabufs I'm creating in wlr_raster::import_buffer are not viewable on the other GPU in a multi-gpu situation, however it works if I want to transfer between two renderers on the same GPU. Most things however do work, SHM applications work beautifully (mostly).
dmabuf clients still have problems.

Edited by Alexander Orzechowski

Merge request reports