(Discussion) Supporting the persistent graphical remote access use case
[It was suggested on IRC that I start this discussion here, but feel free to move it if it would be better elsewhere.]
Background
I'm on the Chrome Remote Desktop team at Google, and I'm working on adding support for Wayland sessions to our product. While our initial focus will be on supporting GNOME, we’d love the end result to be a set of standard APIs that work across login manager and compositors.
I’ve already reached out to GNOME and KDE, but I wanted start a discussion here to gather the thoughts and ideas from the wlroots developers on the subject, and to keep various stakeholders in the loop.
A user-friendly, persistent remote access solution requires a few pieces in addition to basic screen capture and input injection protocols:
- Connection at boot: Like SSH, it should be possible to connect immediately after the system boots. Remote access shouldn't depend on needing to log in locally and launch a graphical session ahead of time.
- Session curtaining: When a user is connected remotely, it shouldn't be possible for a person at the console to observe or interact with their session. Curtaining could be accomplished by running headless, or by rendering a lock screen on local displays and ensuring windows are only rendered to virtual displays and only accept input from the remote user.
- Display configuration: A user will typically want to match the output configuration of the client, so it should be possible for the remote desktop tool to configure virtual monitors, including resolution, layout, and scaling factor.
- No per-session permission grant: While the Portal API model of having a local user grant access for remote control on each launch of the remote access tool makes sense for allowing temporary remote control (such as for remote assistance applications), it doesn't work for persistent remote access, where there is likely to be no local user to approve the connection.
One possible flow for remote login might look as follows. (This the approach currently being pursued by GNOME via unstable APIs for their GNOME Remote Desktop tool.)
- A system instance of the remote desktop tool runs as a dedicated user.
- When a user connects, it requests a remote display from the login manager.
- The login manager launches a headless greeter which is remoted by the remote desktop tool.
- The user selects a session supporting curtained remote access and logs in.
- If there is an existing graphical session that does not support dynamic curtaining, the greeter prompts the user to kill it before the new session can be launched.
- A new curtained session is launched or the existing session is curtained.
- The remote desktop tool hands off the connection to a process running as the user.
- The new process configures the virtual monitor layout to match the needs of the client.
- The user happily uses the new remote session.
Steps 2-5 can be accomplished through means other than remoting a greeter, such as kerberos authentication or a non-graphical PAM-style conversation, but that's outside the scope of what a wlroots compositor cares about, anyway.
From the perspective of a wlroots compositor
Leaving authentication and session launching to the login manager, here are the pieces I believe a wlroots-based compositor would need to support to work with this system, along with a spit-balled idea for how each might be provided.
- Signaling support for remote sessions to the login manager. This could be an entry in the .desktop file indicating support for curtaining and the required remote access APIs.
- Launching in headless/curtained mode. This could be a command-line flag along with an entry in the .desktop providing a separate launch command for headless sessions.
- (Optional) transitioning an existing session to headless/curtained mode. This isn't needed, but it would provide a better experience for users who use their machine both locally and remotely, as it wouldn't require killing and restarting their session each time they switch. This could be a Wayland protocol to query dynamic curtaining capability and trigger the transition.
- Configuring virtual monitor layout/resolutions/scaling factors. I imagine this would be a Wayland protocol in wlroots.
- Display capture / input injection / clipboard management. I believe there are already protocols available for these?
- Managing access control. It should not be necessary to separately approve a remote access tool with each compositor / desktop environment as we don't want to run into a situation where the user remotely launches a session they then can't interact with. This could involve implicitly trusting unsandboxed user processes, or receiving some kind of shared token from the login manager to authenticate the remote access tool.
Note that GNOME and KDE are looking to provide the needed post-session-launch APIs via a standardized DBUS interface (either an expanded version of the existing Portal APIs, or a separate interface limited to unsandboxed processes), while wlroots prefers using Wayland protocols. I don't think this should be an issue, as it should be possible to provide a service implementing the DBUS interfaces in terms of whatever Wayland protocols wlroots uses.