Terminal WG Fundamentals
[ I wrote this draft proposal about half a year ago. I publish it now mostly unchanged (except for a few small changes, and this very paragraph) in the hope that it's useful, but – as per an announcement I'm just about to make in VTE's issue tracker – I won't be following the comments or engaged in discussions. ]
In early 2019 we created this public forum called "Terminal WG", in order to provide an open discussion and standardization platform for developers of terminal emulators and terminal-based applications.
However, we didn't lay down operational and technical principles or rules.
During the first few months, I encountered several proposals which I fundamentally disagree with, and firmly believe that if adopted widely, they would move the terminal ecosystem in wrong directions. In fact, some would break things that many of us have worked for real hard, and to my biggest disappointment and fear, sometimes those who push the given idea know it very well and still push for it.
We don't want to exclude anyone, and it's okay to have niche features or crazy ideas – as long as they are targeted to a few terminals that wish to experiment with them, without expecting the majority to follow. However, for proposals that aim to get as widely adopted as possible, plenty of technical aspects need to be taken into account. In this document I'm focusing on how such ambitious cross-terminal efforts can be successful.
For ideas, features whose goal is to become a widely accepted and implemented de facto standard, I firmly believe the design work and the discussions have to occur along the following technical and moral principles. (Another document talks about my operational concerns.)
The rest of this document, along with its sometimes firm wording, assumes that a proposal aims to become a widespread one (I won't repeat this in each and every section). For niche features, experimental ideas etc. these can be loosened if necessary, but ideally the arising limitations, broken principles etc. should be made clear in the proposal along with justification.
Generic principles
The bar is high
Every participant who wants to design a feature for potentially all the terminals, has to understand, acknowledge and respect that such work requires a significant amount of generic software development and design experience, and a significant amount of knowledge and experience in the particular field (including its surroundings).
Purely the fact that someone has hacked up something for one terminal, which works more or less reliably there, does not necessarily hit the bar for the rest of the terminals and applications to follow. In order to design a feature for all the terminals, the bar is extremely high.
This is especially true for areas that have already seen multiple attempts, none of which really succeeded so far. Here at Terminal WG we're not looking for the 15th competing standard (see xkcd: Standards).
The more complex the feature, the wider the gap between someone just implementing it somehow, versus proposing a design suitable for all terminals to follow. The latter goal might easily require magnitudes more design work. If you're not ready for this, if you're aiming for a solution that works in your terminal and might work in some others too, make this clear in your proposal.
A failed attempt is costly
A failed – or even worse, badly designed and despite that still somewhat successful – protocol does not only consume precious resources in designing and implementing that particular protocol (both in emulators as well as applications), but also makes subsequent attempts more costly.
Can we reach out to terminal and application developers, saying we have a cool new feature, suggesting to them to implement it? Yes. Can we reach out a second time, apologizing that we messed it up the first time, showing them the fixed protocol, and expecting them to adjust their code? Maybe. Can we reach out to them a third time, in case the second version was still problematic? Most likely not. We just wouldn't be taken seriously.
Especially for features that already have a previous attempt that we wish to fix, we really need to get it right, we probably won't have any further chances in the foreseeable future.
A failed attempt is not only costly for that particular feature, it is costly in general. It results in loss of credibility of this forum as well as terminals and their developers in general.
Of course there's no way to know in advance if a proposal would fail, and fear of failing could result in us just not doing anything, which clearly isn't the goal. It's okay to take risks, do your best, and still fail. It's not okay to be careless, it's not okay to push something across red flags.
Know the ecosystem
Terminals are part of a quite complex ecosystem. There are several vastly different ways of using a terminal.
For example, many utilities "just print" something once, not even caring about the terminal's size, and then quit. Some others "just print" something once, adjusting to the terminal's width, but still don't care about its height. Yet others control the entire canvas and are able to reprint the contents (e.g. after a resize).
Another example: many users prefer to use their native graphical terminal, while many others prefer to always run a screen or tmux inside.
New proposals MUST take into account all these various common ways of using a terminal, not just a subset of them.
Look into the future
It's good to know where terminals are coming from, how they changed over time, what features faded away during the decades, and what new features appeared recently. However, focus on the future. Work on making terminals an even better tool for power users. Don't try to resurrect ancient features just because forty years ago some hardware terminals had them.
Realistic ideas
New standards should have a strong potential of actually getting implemented in plenty of terminal emulators out there.
Terminal emulators have scarce developer resources. It might easily take 5–10 years or even more for a popular, often requested feature to get implemented in one. New proposals should aim to be as simple and clean as reasonable, and not require developer resources that most terminals won't be able to allocate in many years to come.
Silly example: If you wanted to extend the terminal's capabilities to render TeX, feel free to experiment with it in your own terminal, but don't expect others to follow.
A proposal that targets all the terminals, but is reasonably only going to be implemented by one or two mainstream terminals in the next 5–10 years is probably not a successful proposal.
Coexist
Various features of the terminal MUST be able to coexist whenever it makes sense. You can't have feature A and later design an independent feature B without designing how features A+B will work together.
Examples: If you can either display pictures, or run a multiplexer, but cannot (due to a limitation of the protocol) theoretically display pictures inside a multiplexer then it's bad design. If you can display left-to-right as well as right-to-left text, or do foobar with left-to-right texts, but cannot equivalently easily do foobar with right-to-left text, it's bad design.
Quoting jerch from https://github.com/xtermjs/xterm.js/issues/2570 :
A sequence, that has several "works only if ... and if ..., never when ..." restrictions is a bad one.
Dare to say "no"
The task of a developer is not to try to address each and every feature that users ask for. The task of a developer is to pick features that are compatible with each other as well as with the project's vision, reasonably designable and implementable, and worth the effort.
It's okay, and it's even desired to say "no" for an idea that doesn't fit.
There are great articles and talks about this on the web, e.g. https://www.intercom.com/blog/product-strategy-means-saying-no/ .
Extend, but DO NOT BREAK
Sometimes several features are literally incompatible with each other.
Sure you all know the "joke" when a service provider (e.g. repair shop) tells you that they can do three types of work: fast, good and cheap. You can pick any two. We'll, it's not a joke, it's serious. If they offered all three at the same time, they'd soon go bankrupt.
If you add a feature that is literally incompatible with an earlier one, you necessarily break the earlier one, or at least don't make it possible to use both of them at the same time.
Terminals have many constraints. Often from these constraints a huge power arises elsewhere.
Examples:
-
Fixed cell width (monospace fonts) allow applications print and align pieces of output, without having to know anything about the font. A proportional font sure looks nicer in the typographical sense, but apps could no longer position things without knowing the font, and could no longer have concepts like the terminal size in characters.
-
Being able to perform terminal emulation without knowing anything about the font or font size is also a key factor in the existence of great tools like screen and tmux, as well as ssh with reasonable speed.
-
The fixed grid allows terminals to reflow their contents when resized. If characters could be placed at arbitrary positions on the canvas, reflowing wouldn't be possible.
Feel free to suggest enhancements that don't break any power resulting from the current constraints.
But if you're proposing to remove a constraint, for some seemingly great local benefit, which as a side effect breaks some great power resulting from that constraint, the new idea needs to be reworked or rejected. The existing power of terminals must remain.
Example: As I take it from the comment at https://gitlab.gnome.org/GNOME/vte/-/issues/195#note_642816 (an otherwise irrelevant thread), one of the lengthy discussions of Terminal WG is summarized as this: Some users want a certain behavior so bad [pun intended] that they:
-
don't mind if that behavior sometimes falls apart as long as it kinda works, and
-
don't mind if it breaks other features.
The response of any experienced developer to this must be:
-
"no", and
-
"no".
I mean, you can experiment with this in your own terminal, but don't push the rest of the terminals in this direction.
Our job is to design high quality features, without breaking existing fundamentals. If something cannot be done, we can just state it to our users, briefly explain the reasons, and point them to alternative approaches (such as a graphical application).
Respect the mainstream
Unfortunately, we've seen comments where developers of unorthodox terminal emulators expected others to follow their unorthodox approaches. Don't expect this to happen.
Know and respect how the majority of terminals behave and how they are commonly built up, and work on further improving them.
If you aim to be an unusual, special one, go for it, but please don't try to make others follow you.
Know and respect the market share
Probably there is no way everyone would fully agree on a complex proposal.
If a terminal with a tiny market share votes against a proposal, that's acceptable. If a terminal with a huge market share rejects a proposal, the proposal has failed to become a widely accepted one straight away.
(With great power comes great responsibility. Don't abuse it.)
World outside
It's not a goal to pimp up terminals to feature parity with graphical systems.
There is a world outside terminals, there are amazing graphical desktops with amazing graphical applications. If the graphical desktop or some other system suits better for a task than the terminal, don't be afraid to use that, and point your users to them.
Let the terminal keep its strengths where they already are. Do not sacrifice them for seeking new strengths in territories where other systems already excel.
Concrete principles
Rectangular character grid
A terminal provides a rectangular grid of cells.
This is hardwired deep inside every system. There's a well-defined concept of the number of rows and number of columns, backed up by kernel structures and ioctls, means of notification when they change, and all this standardized in POSIX, see e.g. http://austingroupbugs.net/view.php?id=1151 . There's the common pattern of shells setting LINES and COLUMNS (perhaps also specified by POSIX?).
Terminal emulation revolves around character cells as its basic building blocks, organized into a rectangle grid.
Breaking this would break pretty much all existing apps in uncontrollable ways. This fundamental property MUST remain unchanged.
Headless, multi-headed terminals
Many terminal emulators always have a window on the screen. But many don't. For example:
-
There are screen and tmux which may or may not be attached at a time, or might even be attached from multiple different graphical terminal emulators.
-
There is a headless terminal emulator library called libvterm.
-
Konsole allows to have a split view, even with different font sizes.
-
VTE also wishes to fully separate the emulation from the graphical representation.
-
Terminal emulators might want to unittest their emulation features, without depending on or having to know anything about the concrete graphical representation, or mocking a font.
This technical separation of layers (emulation vs. presentation) is good engineering practice, and is something a lot of users are implicitly looking for (e.g. screen, tmux users need it).
Newly designed features must have headless and multi-headed terminals, as well as operations such as detaching and attaching tmux, in mind as first class citizens, and must be implementable by them without compromises.
Deterministic emulation
A consequence of the previous: The emulation behavior, as in which character cell contains what character and attributes, must not depend on the view (or views, or lack thereof), including any property of the view such as the font size.
If the same data stream is sent to various terminals of the same logical size (character row × character column count) and same initial state, e.g. to:
-
a headless terminal,
-
a one-headed graphical terminal with small font,
-
a one-headed graphical terminal with large font,
-
a multi-headed graphical terminal with multiple different font sizes at once,
-
an attached tmux,
-
a detached tmux,
-
etc.,
they MUST all end up with the exact same logical grid (every corresponding cell of the grid containing the same character across these terminals).
(Unfortunately there are legacy features that can only work when there is a window available, because the behavior depends on a property thereof (such as the cell size). An example for this is the Sixel image support, available in Xterm and some other terminals. As one can see in Xterm, the display falls apart after a font size change. But more importantly, headless or detached emulators, or ones with multiple views at the same time, cannot theoretically implement such a feature: they have outright no idea how many character rows an image would occupy. Even in Xterm this depends on the current font size. Such features should be deprecated, and no new one should be designed that suffers from this issue.)
Multiplexing
Screen and Tmux can use a part of the host terminal emulator's area, and display the result of their own terminal emulation there. Screen can perhaps only shrink the main area by a status bar, whereas Tmux can even do horizontal and vertical splits.
We can't exclude the possibility of a future terminal multiplexer offering free-flowing, overlapping windows, or an application implementing a similar internal UI. (To give you an idea: start "mcedit file1 file2", click on the "[*]" button near the top right corner, and then move/resize the subwindows using the mouse.)
Every newly designed feature must take these into account, MUST be implementable by screen, tmux, or even a future multiplexer that allows arbitrarily overlapping windows, positioned and cropped along character boundaries.
Channels: ssh, sudo
Channels such as ssh or telnet for network communication, su or sudo for switching user, virsh, lxc etc. for connecting to a virtual machine etc. are used by millions of users. Features should not break or degrade when any of these are used.
As a consequence, a protocol MUST NOT refer to a local file which would then be accessed by the terminal. The cost of sending the contents over a local tty line is much smaller than the opportunity cost arising from degraded user experience, lost productivity, frustration across ssh and alike.
Accessing local files isn't free of security concerns either. If the user is ssh'd to a compromised site, the attacker could, using timing, or – if the "deterministic emulation" principle isn't respected – by querying the cursor position before and after embedding a local resource, retrieve some information about the local filesystem.
Synchronous
Unless absolutely inevitable, newly designed escape sequences MUST be fully synchronous (that is, the application sends them out and knows fairly well what's going to happen in the terminal).
Asynchronous operation (where an application sends a query, waits for the response from the terminal, and then continues based on this response) suffers from two fundamental unfixable problems:
For simple tools that just quickly perform some task and then quit, it's literally impossible to add reliable support for them. For complex apps it's also often extremely complicated. Waiting for a response without timeout can easily lock up the application in case something goes wrong (e.g. the feature is incorrectly assumed to be supported). Waiting with a timeout raises questions that can't be answered: How long should that timeout be? What if the response doesn't arrive in time? What if the response arrives later (maybe even after the utility has quit), how will it be handled, how will it not break subsequent steps? Also, a permanently degraded user experience (e.g. something missing from the output of a utility) due to a transient problem (e.g. a network lag or extreme local load) is unacceptable if the underlying system (e.g. ssh channel) survives.
Asynchronous querying and responding is extremely slow. Let's say you have an ad-hoc shell script that launches a utility 450 times in a loop, which utility in turn needs to perform such an asynchronous step. You'd expect such a script to complete in a second or two. When executing this over an ssh connection to the other side of the globe, the theoretical minimum time it takes to complete, just computed from the speed of light and no additional delay, is 1 minute. As opposed to data transfer speed, which gets faster year by year, this can't get any faster. (And sure you can rewrite the tools and scripts, but having to rewrite them is also a huge productivity loss, and a clear proof of bad design of the terminal protocol.)
No per-line properties
More and more terminals offer the great feature of rewrapping the lines on resize. Even for those that don't, it's an often requested feature.
This feature is incompatible with per-line properties (except for the bit telling if the line ends in an overflow or an explicit newline), as there is no way the given property could live on after rewrapping.
There is probably no feature idea more important than the productivity boost provided by rewrapping on resize.
Newly introduced properties MOST NOT be line-based. They can be per-character, per-paragraph ("paragraph" here meaning the block between two explicit newlines), per-screen (as in normal vs. alternate screen) or global properties.
(Unfortunately a few existing features are per-line. Luckily though, they are implemented only in very few terminals. If there's a demand for reviving one of them, a new solution along this principle should be designed instead.)
Keep it ASCII
Except for text in the current encoding, and the necessary control characters themselves, the data stream (in both directions) MUST BE limited to the ASCII characters (32..126 bytes).
This way it's safe to pass them through charset conversion layers such as luit. As a nice bonus, in a fully UTF-8 environment the entire data stream remains valid UTF-8. See the legacy mouse protocol for a bad example that breaks this above coordinate 95.
Also keep in mind that CR and LF are mangled by the kernel according to the terminal line's output flags (e.g. "stty onlcr").
Evaluate earlier attempts
And finally a non-technical, but at least equally important one:
For proposals that address something which already had previous (failed or moderately successful) attempts:
No matter how good on its own, no new recommendation can reasonably take over earlier attempts without arguing why the new one is better. If it's seemingly just another new protocol (the 15th, as per xkcd), there'll be hardly any incentive for terminals (especially the ones that already implement another protocol) to adapt it, and there'll be hardly any incentive for apps to support it either.
Without proper evaluation and reasoning why we want to design a new one, our work would also be straight disrespectful against the author of such earlier protocol.
Every new protocol suggestion must thoroughly study and evaluate all existing protocols of the same feature, and MUST either come up with firm arguments why none of those protocols are suitable for becoming widely accepted (e.g. break some of the principles laid down in this document) and why the new one is believed to overcome those obstacles; or, if one of them is great (with slight modifications perhaps) then argue for supporting that instead of coming up with a new one. It's a lot of work, I know.