Draft: staging: Add ext-placement protocol for output-relative window placement preferences (!247) · Merge requests · wayland / wayland-protocols

Matthias Klumpp requested to merge mak/wayland-protocols:wip/client-position-requests into main Sep 28, 2023

NOTE: We are currently working on an alternative solution that preserves (most of) the compatibility with this protocol but does not expose a global coordinate system. Please join the discussion at !264

Hi!

This PR introduces an extension to the xdg-shell protocol which:

Allows clients to request the current output and absolute X/Y position for their own top-level windows from the compositor
Allows clients to suggest to the compositor to place a top-level window at a specific output and X/Y position
Ensures the compositor always has the final say about placement, and the client may only provide hints

As this is probably a bit of a controversial feature (there is a very old feature request in #72), here is a detailed justification with examples on why this feature is beneficial for Wayland adoption and porting apps to it, while having very little downsides (that I see at the moment, at least). But of course, let me know if you disagree!

Why do we want this?

Application compatibility

I work in scientific research, and there is a lot of apps which follow a multi-window approach, some of which run on Linux, which we can not properly port to Wayland unless either severely degrading the user experience or rewriting the UI entirely - both isn’t feasible, so these apps are stuck on X11 at the moment.

Here are some examples of Linux apps that follow this pattern:

Lazarus IDE: The IDE consists of multiple windows, which have a specific default arrangement to be user-friendly, and then can be placed anywhere by the user to fill the available space. E.g. have the code on one monitor, and the GUI editor on another one. Adding the placement protocol extension would allow this app to set a sensible initial default window placement.

GIMP in multi-window mode: Some people prefer this mode, especially with multiple larger screens. With the procol extensikn, GIMP could suggest a good default window arrangement, rather than having all windows in random positions or stacked on top of each other via the compositor's default rules that do not know the context that the app has.

Syntalos: This is a scientific application for data acquisition, which is using multiple windows that open and close depending on user requirements. New windows should (and do on X11) pop up on specific positions next to other windows and click positions, and their positions are remembered on a per-experiment basis and restored if that experiment configuration is loaded again. For example, for one experiment you want your camera image magnified on screen 2 and your sensor displays stacked on screen 1, and for a different experiment this might be reversed. Currently on Wayland, users need to rearrange their window every single time they load a configuration, which is extremely annoying and made us switch back to X11.

MesoSPIM Control: Similarly to Syntalos, this software also uses multiple windows for flexibility in instrument control via multiple windows, and greatly benefits from being able to save & restore its window arragngement, as well as setting a good initial arrangement.

Huygens: This is a commercial software with its own GUI toolkit for image analysis and deconvolution on Linux. On Wayland, its windows appear in random places, while on X11 they are deliberately placed based on the context the application knows about, e.g. not occluding the display space.

ImageJ: ImageJ is an open-source scientific image analysis tool that also utilizes a multi-window editing mode. For example, if I enable the contrast adjustment tool, I expect its window to show up directly next to the last-active image window, to visually associate them, rather than having the compositor place the window anywhere on screen.

There are also other applications that have more specialized demands, for example tools which display game overlays and software like Valve’s Steam which do have a legitimate interest in throwing a small information window in the bottom-right corner of a screen upon game launch.

Of course you could argue that all of these applications should be rewritten from the ground up, but especially in science this is not feasible - people will use X11 indefinitely rather than going to Wayland with a much degraded user experience. Developing one of these apps, I can also say that transforming them into a single-window app would be technically challenging, as there are multiple binaries actually drawing the GUI under the hood. Also, for the scenarios these apps are used in, the multi-window approach is actually very desirable and the best UI you can design to make use of multiple screens and large display space.

Application portability

Windows and MacOS allow client window placement on their platforms. This means that any Windows app that's been made to work on Linux using Wine on Wayland that expects windows to show up at certain positions will be broken. I've come across quite a lot of these apps in research (mostly for instrument control), but there is also most likely more common applications (like e.g. media players) that expect this platform behavior.

Web compatibility

Because multiple screen and window placement is super useful for scientific and monitoring apps, and the web wants to be the runtime for everything nowadays, there of course is also a W3C draft to allow web applications to place windows on specific screens: https://w3c.github.io/window-management/ Currently, Linux on Wayland can not support this, which will be very annoying for people wanting to use this feature in the future (when it works everywhere else but on Linux).

The Chrome Platform Status page for this W3C draft also mentions more scenarios for why having an API like this is desirable:

Financial app opens a dashboard of windows across multiple monitors.
Medical app opens images (e.g. x-rays) on a high-resolution grayscale display.
Creativity app shows secondary windows (e.g. palette) on a separate screen.
Conference room app shows controls on a touch screen device and video on a TV.
Multi-screen layouts in gaming, signage, artistic, and other types of apps.
Site optimizes content and layout when a window spans multiple screens.

Source: Chrome Status

The proposed implementation

There are many kinds of Wayland compositors out there, which handle window management differently. So creating an API which mandates compositors to follow a clients positioning request will not work and is not desirable, especially if the output is no traditional desktop and e.g. a phone screen. The compositor must be in control at all times and have the final say on where a window goes, the client can merely make suggestions.

Therefore, before a window appears, the application can make a set_preferred_placement request to ask the compositor to place the window on a specific output and in a specific place. The compositor can then honor the request, ignore the request, or place the window roughly in the requested position, depending on its policy.

A client can also request the current position of its own windows on the screen explicitly by making a get_placement request, which the compositor replies to. Clients are not allowed to request the position of anything but their own windows (so no snooping on other apps).

The API only applies to toplevel windows, modal/dialog windows are still always placed by the compositor.

Security

Currently applications are apparently not allowed this functionality in the name of security. So, let’s address a few of the issues I found people have with this feature.

Fooling users by mimicking UI elements

Hypothetically, an application could create a borderless window, and put it somewhere on screen where a password input may appear, masking the original. It could also put its own window imitation over default desktop-environment UI elements, like taskbars and buttons.

However, I do not think this is a realistic scenario: To put the window over a password input, the application would first need to run at all, locally (which is already a huge security breach if a malicious app managed to do that), and then would somehow need to find out where that password input field even is. For that it needs elevated permissions already to screengrab the content of a screen, which Wayland rightfully prevents by default. Even then, the compositor could deny the position request for any application that does not draw a titlebar to easily mitigate this, or ensure a shadow is drawn around a floating borderless window to visibly distinguish it.

Similarly, the compositor knows where desktop-shell elements are and could easily deny any attempt of a window trying to be on top of them. In addition, apps can already request a fullscreen window, so, an app could simply replace the whole desktop in the user's eyes and mimic some important input to trick people into entering their sudo password.

So, allowing positioning does not diminish security more than a fullscreen request does.

Tracking users via window positioning

Hypothetically, knowing how the user moves a window provides an app with information it could use for fingerprinting. I do not think this is a big risk, given that the application already has access to a plethora of data for fingerprinting, like hardware/screen configuration, OS information and installed fonts. However, if this is a concern, a compositor could simply rate-limit the get_placement request. None of the legitimate usecases I found do require the application to know about window position changes all the time, which is why the application needs to explicitly request the position (rather than the placement event being emitted on every position change). This should be an effective method to spoof any data collection attempt via tracking window positions.

Even then, the benefit of knowing window positions would be limited, as the application could only ever know about its own windows, and has no idea about the position of other windows, which would be actually a lot more valuable.

Usability

We want to avoid any situation where the user does not feel in control. So having a window suddenly jet across the screen because its app emitted a set_preferred_placement call for it, is unacceptable. Therefore, compositors must only honor that request for newly created windows, and ignore it once the window is being displayed. None of my usecases requires an app to retroactively on its own volition change the window position.

PAQ - Preemptively answered questions

Why not use the xdg-session-management protocol?

The xdg-session-management protocol also saves & restores window positions, but it can not be applied here because it has no knowledge where to place a new window on its initial creation, while the application has the necessary context available to place it somewhere that works for its UI design. In addition, in case the same window should appear in different positions based on how the user configured an app or which configuration it loaded, the session-management protocol can not handle that unless there is complicated negotiation between compositor and client to handle multiple profiles for the same set of windows. And at that point, the client might as well just tell the compositor where it likes to have new windows displayed, which would be a lot less error-prone.

When the xdg-session-management protocol is in effect, its positions override any positioning hints that the application might give via this protocol extension, of course.

Why absolute positioning? Wouldn’t relative positions be enough?

That was my initial design! It would have allowed the client to make requests like “place this window to the right/left/top/bottom of an existing window of the same app”. However, clients could have cheesed this method to still have windows end up in exactly the place where they wanted them to be, it was incompatible with how apps currently handle window placement, so would have required extra porting work, and we would have still lost compatibility with proprietary and windows apps.

In that light, it seemed better and also simpler to just go all the way and provide absolute positions for clients.

Shouldn’t this be a D-Bus call that we can permission-control?

The set_preferred_placement method should be called before the window is displayed - with D-Bus, we would have a non-zero chance of the window flying across the screen to its new position once the D-Bus call arrives at the compositor after the window has been created. In addition, with the compositor being in control over the final placement, I also see no reason why this call should be permission-controlled at all.

The same issues do not apply to the get_placement call though, which could be done via D-Bus if there is a reason to actually control the application’s access to this information fully.

I have a tiling WM!

No problem! For window managers like this, it is perfectly legitimate to just ignore the client’s positioning requests. The API specifically mentions that clients should not rely on absolute placement to happen. It is merely advice to the compositor to choose a sensible initial spot to put a window.

Same goes for weird form factor displays or specialized compositors, of course (but those may not even implement xdg-shell at all).

I have not written an amendment to a Wayland protocol before, so I hope this patch looks okay and xdg-shell-unstable-v6 is the right place to go. At the moment it is intended to provide a concrete proposal to implement this feature that we can discuss. 😊

Thanks for considering!

Edited Mar 17, 2024 by Matthias Klumpp

Admin message

Draft: staging: Add ext-placement protocol for output-relative window placement preferences