Feature reporting revamp
Feature Reporting in Terminal Emulators
Many popular utilities or applications require a terminal to run. In order to provide a proper experience to their users, these apps need to know certain details about the terminal, e.g. need to know how to turn on certain modes, or whether a particular feature is supported at all.
50+ years after the appearance of hardware terminals, and 30+ years after the appearance of graphical terminal emulators, apparently the area of feature reporting is still not solved. Existing methods all have noticeable limitations, which results in an ecosystem where user facing incorrect behavior due to misdetection is not uncommon at all.
Certain recent changes to Unicode made this more prominent. Version 9.0 changed the width of many codepoints, a backwards incompatible change that caused plenty of breakages until everyone caught up. Version 8.0 added Emoji support including the VS16 emoji variation sequence, and version 9.0 (UAX #11 version 31) states that VS16 converts a preceding narrow character to a wide one. This makes per-codepoint width calculation (e.g. by wcwidth()) impossible. Version 11.0 (UAX #11 version 35) added a clause stating that the East_Asian_Width property is not necessarily an out-of-the-box solution for the width.
Given these recent changes and the direction Unicode seems to be heading towards, even the simplest scenario of just printing formatted, aligned or truncated text to the terminal emulator requires proper feature reporting, in order for the emitting application to know the display width of characters.
In this document I aim to outline the problem with the existing methods for feature reporting and come up with a new draft recommendation.
The current methods
Currently the following methods can be used to check whether the terminal supports certain features:
- Escape sequences that generate an asynchronous response.
- Terminfo database entry, as pointed to by the TERM environment variable.
- Terminal description inlined in the TERMCAP environment variable.
- Certain other env vars corresponding to a particular feature, e.g. COLORTERM or COLORFGBG.
- Environment variables naming the emulator itself or the version thereof, e.g. TERM_PROGRAM or VTE_VERSION.
Let's see their limitations.
Asynchronous escape sequences
Escape sequences that query the terminal emulator are by their very nature asynchronous.
One problem with this is that it's potentially slow. Each time there's something to query, the roundtrip time has to elapse between the application and the emulator, potentially over ssh, across the world, over a laggy network. Doing it once at the start of an interactive app may not be a big deal, but having to do it in a utility that's potentially invoked many times from a shell loop is an ultimate performance killer.
Another extremely complex problem set is a bunch of potential race conditions. Other piece of input might arrive before the expected response to the query, this needs to be handled correctly. The response might not arrive at all, due to the terminal emulator not recognizing the query (thus not sending a response), or also some weird rare circumstances where multiple apps produce output towards the terminal emulator and another one's data cuts our app's query sequence in half. If an app has a timeout for waiting for the response, it also needs to handle if the timeout elapses, and then later the response arrives when it's no longer expected.
Let's look at the following bash code snippet:
# do some work 1 fg=$(xtermcontrol --get-bg) # do some work 2 read var
If xtermcontrol could reliably report the background color, as it will be able to do this with the proposal below, then this code would reliably work as expected.
With the current asynchronous behavior, the code suffers from the following bugs:
- Keys that the user typed ahead during the "work 1" phase will be lost, not make it into "var".
- Certain special keys (especially ESC), if typed ahead during "work 1" might trick xtermcontrol to fail.
- If the response doesn't arrive in time, xtermcontrol fails to report the requested value, causing potential breakage or downgraded behavior in "work 2" and the rest of the script.
- The response might arrive later than expected, after xtermcontrol has given up. In this case it will break the line editing while reading the value for "var", and potentially break the value of "var" too.
Probably asynchronous reporting works with a pretty good success rate in practice. Maybe it works 99.999% of the time. Still, the more we would rely on them, the more fragile the entire ecosystem would become, with bugs that aren't reproducible and aren't fixable. For robust, reliable behavior, the use of asynchronous querying needs to be cut back as much as possible.
Trying to properly handle (and unittest) all the raceable cases – if possible at all in the language, libraries etc. used by the app – would increase the complexity by magnitudes, and probably more importantly, would be a giant pain point no developer wants to experience. I leave it as an exercise to the reader to fix the aforementioned four bugs in that two-line shell script, see if you manage to fix them at all, see how complicated it becomes, and ponder about how it would go for more complex applications.
A small subset of the properties are not necessarily fixed throughout the emulator's lifetime, yet applications might want to query the current value. A typical example is the color scheme (especially the default foreground or background colors). See the COLORFGBG environment variable for a terrible attempt. Other such properties could be the encoding (including, for example, whether UTF-8 is locked or can still switch to box drawing mode), the current method used for width calculation, the font size (for inline graphics in certain emulators) etc.
The terminal emulator can set up the environment variables for the child (typically shell) that it launches, but cannot modify it afterwards. Therefore values described using environment variables, either directly or indirectly by pointing to fixed data (e.g. a read-only file) cannot convey such variable pieces of information.
A terminal emulator and its description can easily be out of sync (in either direction), due to multiple reasons.
Vendors, distributions take the terminal emulator software from their repositories, and the terminal descriptions (terminfo) from its repository. These two processes are not synchronized. Either one may be newer than the other.
If one ssh's to a remote host, they might easily be many years apart from each other, again, in either direction. Maybe the terminal emulator supports tons of brand new features which aren't yet mentioned in terminfo, resulting in a degraded experience. Maybe the description is newer than the actual emulator, resulting in broken behavior.
There's no versioning. There are a few terminfo entries which have a year in them, but it's unclear to me whether they correspond to the beginning or the end of the year, and whether the latest development version from VCS, the latest development release, or the latest stable release from that time. Even if these were clear, they couldn't solve the issue with ssh. What we'd need at the very least is automatic switching to the newest available definition. E.g. if the emulator sets
TERM=foo-1.0 but the newest definition in the terminfo database is
foo-0.98 then this latter one should be used by terminfo. I don't think such versioning exists in terminfo (please correct me if I'm wrong). Linear versioning still couldn't solve cherry-picks to older branches, e.g. when a new feature appears in version 0.98.5 within the 0.98.x series, and in version 1.0.2 within the 1.0.x series.
There's a huge roundtrip time until a newly implemented feature becomes available. From the moment my preferred terminal emulator implements a certain feature I'm interested in, it might take months or even years (especially across ssh) until I get access to them via terminfo. Why not immediately?
It's unclear whose responsibility it is (or it should be) to ship the terminfo description matching to a terminal emulator, and how it should be done.
Should it be shipped by the developers of the terminal emulator? If so, how will vendors/distributions fetch and keep updated all the descriptions, even from terminal emulators that use foreign packaging system? E.g. how would a Linux distribution ship a terminal definition that's contained within a macOS .dmg or a Windows self-installing .exe? There are gazillions of emulators out there, if I create a new one called "foo", how will its terminal description ever reach all the various Unix systems? Or, if vendors decide to package the descriptions along with the actual emulators, how would those get installed on remote hosts where I ssh to but don't have my favorite terminal emulator (or probably any terminal emulator at all) installed (let alone foreign OS emulators that theoretically could not be installed)? What about licensing, shipping the (hopefully free) description of a non-free emulator?
Or should it be centralized (as it is now)? Is it then the terminal emulators' developers task to contribute the changes upstream, or is it terminfo maintainer's task to keep an eye on emulators and keep the descriptions up-to-date? Or do we count on volunteers occasionally, on an ad-hoc basis notifying terminfo? If a terminal emulator designs and implements a brand new feature, how can it denote that if the central repository rejects adding a brand new property? At the very least, the central repository would have to accept any property (maybe limited to a subset, let's say having an "x-" prefix) without questions and without judgement of the feature. (According to my personal experiences, unfortunately this doesn't seem to be the case.)
As a workaround for these latter issues around maintenance of the description, quite a few terminal emulators decided to take the easy path of piggybacking already existing descriptions which is technically an incorrect choice and could (and sometimes does) lead to incorrect behavior.
The TERMCAP environment variable seems to be obsoleted by TERM (or at least is rarely used nowadays), although I believe it had a better overall design. It doesn't suffer from the problems around maintenance, responsibility thereof, and availability of the data. (Note that I'm not talking about the single file
/etc/termcap which is obviously a nightmare.)
It's pretty ugly to ship potentially kilobytes of data in environment variables, though.
It also cannot be updated runtime, similarly to TERM.
Other environment variables
Any other ad-hoc environment variable is obviously ugly and is an unmaintainable approach in the long run. I assume this doesn't need further explanation.
A new approach has to satisfy as many as possible (hopefully all) of these criteria:
- Whenever a new feature is implemented, applications should immediately be made aware of this, even across ssh.
- The terminal description should be pushed close enough to the applications that they can read the properties synchronously.
- The terminal description should be runtime alterable.
- Should be decentralized.
- Emulators should no longer piggyback on each other's more or less matching description.
- Needs to be backwards compatible: fall back to existing methods if the new one is not supported.
- Needs to be reasonably simple.
Let's see my proposal.
The termdesc file
When opening a graphical terminal, the emulator application creates a temporary file (e.g. under
tmp). This file describes as many of the terminal's properties as possible, including hardwired emulation behavior (as currently described in terminfo), plus rarely changing runtime attributes (e.g. colors, encoding).
(It does not contain frequently changing runtime attributes, most prominently the cursor position. For this, asynchronous reporting would remain, although would be discouraged.)
A new environment variable (e.g.
TERMDESC – a better recommendation is welcome! termdata? termdef? termprops?) is set up to point to this file.
When a property changes (e.g. the user changes a setting of the terminal that might be relevant to apps), the temporary file is replaced by the terminal emulator using an atomic rename() operation.
We could see if there's a point in allowing the union of multiple files, or just introduces unnecessary complexity. That way a terminal emulator could ship a fixed file defining its capabilities, place the variable ones in the temporary file, and set up the environment variable to contain both filenames. ssh could save some traffic if it could verify with checksumming that the global part with the same contents is available on the remote site too.
The temporary file would be owned by whoever creates it. Whenever the creator exits, it should delete the file. This should cause no trouble for the apps using this file. For easier manual cleanup in case of runaway files (e.g. the emulator crashes), we might encourage the convention of
%s is replaced by the terminal line (
/dev prefix removed, slashes replaced by something), e.g.
For the Linux virtual console, it could be *getty responsible for managing this file (if there's no variable data to report, it could make
TERMDESC point to a fixed file, e.g.
Notification of change
Some utilities (e.g. the ones mentioned in the next several sections) might want to or need to get notified when contents of this file changes. Here I can see two approaches, we'll need to study them closer and pick one.
One could be that it's the responsibility of each application to keep an eye on this file, should they want to. They could use e.g. inotify.
Another solution could be a signal. Since the signal namespace is pretty dense, I think we could piggyback SIGWINCH. The terminal emulator would send this signal after replacing the file. We'd need to make sure that no party shortcuts the signal delivery if the actual window size didn't change.
Without having taken a close look at these possibilities, my instincts prefer the inotify approach. Changes to the signal's meaning are probably extremely hard to push through various kernels, POSIX and so on.
When setting up an
ssh connection that has a terminal, the ssh client would automatically forward the contents of this file, place it on the remote side and set up the environment variable to point to the corresponding file over there. (I believe this is pretty similar to how XAUTHORITY is handled.)
Depending on our choice for the file format, ssh may need to re-encode the file (e.g. 32 vs. 64 bit, endianness).
The ssh client would be one of those probably few apps that keep an eye on this file, and updates the remote one whenever the local one changes (plus emits the signal over there, if we pick that approach).
They are somewhat similar to ssh. They don't need to forward the file over a network, but on the other hand, need to modify their contents (e.g. change the escape sequences that some keys generate). They'd place the modified file somewhere and adjust the environment variable accordingly. They'd also have to keep an eye on the original file and update their own version upon change.
Probably all they need to do is make sure that the original file is readable by the new user. See below for privacy and security considerations, though.
Alternatively, they could create and maintain a copy.
There will always remain certain channels where automatically forwarding the entire terminal description is not possible. For example when telnet'ing over a serial line, or using a piece of software that connects you to the console of another (possibly virtual) operating system, such as
For conveniently getting access to all the features of a terminal emulator, there should be an easy way of manually transmitting the description.
There would be an escape sequence to which the terminal emulator responds with the entire description (the contents of the file, in some safe encoding). There'd also be a convenient utility that calls this method, waits for the response with a reasonably large timeout, and then places the response in a new temporary file and outputs
export TERMDESC=/tmp/blah or so, which can simply be
eval'ed on the remote host to get the proper settings. It could also set up a shell exit hook for cleaning up the file. If the environment variable is already set then it would just atomically update the file's contents.
Note that the wire transfer format would need to be architecture independent.
These are just some early ideas on some details that don't concern the big picture.
ssh would be thankful for having an architecture independent file format. Otherwise the client would need to serialize the data into an architecture independent format, and the server would need to deserialize. This way it could just copy the file as an opaque blob. For reporting the features asynchronously (the “Asynchronous setting” section above) we would also need to serialize and deserialize. Probably it's the cleanest to have an architecture independent file format to begin with.
Do we need hierarchy, some kind of levels, groups? The only one I can imagine is to perhaps separate the input and the output sequences, as well as properties that aren't sequences. I don't think terminfo separates them, though. And some sequences, like OSC ... ST can be used in both directions. I think a flat structure is fine.
It's definitely an advantage if the file format is super simple, writable and readable without special library support.
It's probably also an advantage if the escape sequences are stored in the file verbatim. A reading library can just mmap() the file and set up pointers to the values without any copying. Since the file is not modified in place (mmap could also explicitly prohibit this using the proper flags), there's no risk of memory corruption. Even if the file is deleted (or replaced with a rename() call), the previous file's contents remain accessible under the mmap()ped area.
Here's a very simple suggestion.
The file begins with a magic code, let's say "TERMDESC1" followed by the NUL (0x00) byte. Then it's key=value pairs, each terminated by a NUL.
Magic is nice to have. Versioning allows for future changes to the format. The next NUL allows to conveniently search for a key using e.g. memmem(), with the needle being NUL + the key + the '=' sign (or NUL + the key + NUL for boolean ones), without requiring special casing for the first key present in the file, and without finding false positives of the desired key prefixed by something.
The file is presumably parsed more often than produced. For more efficient parsing, we might require that the keys are in ASCII strcmp() order.
Do any of the values need to have the NUL byte embedded? I hope not, I couldn't find such occurrence in terminfo.src. Otherwise, how do we solve this? In addition to (or instead of) the trailing NUL, have a single byte preceding the value, denoting the value's length?
Do we want to retain terminfo's typically 2–5 letter long keys, or want to come up with brand new ones? Do we want to invent similarly short keys for features that are missing from terminfo (e.g. turning off bold, or plenty of new features recently added by some terminal emulators)? I tend to say if we're designing something new, we should leave the legacy behind and come up with reasonably readable names (and functionality). Also I think it might be the good time to drop anything that's related to ncurses and not the terminal emulator itself (e.g. "pairs"), time to stop advocating ACS line drawing ("acsc"), time to remove initc's arbitrary scaling of 1000, time to re-think how modifier + function keys or modifier + arrow keys are denoted (especially if the modifier substantially changes the escape sequence) etc.
It's time to drop the default cols and lines, at least for resizeable emulators. Should we store the runtime values? We have a robust means of knowing the size (TIOC[GS]WINSZ, SIGWINCH), there's probably no point in duplicating this.
terminfo supports conditionals in the values, used e.g. in setab to produce significantly different output if the number is greater than a particular threshold. Do we need to (I think so), do we want to keep or replicate this feature? If so, using the exact same syntax or a different one? Can we and do we want to build on top of libterminfo to perform the resolving of these?
Should we be worried about the existence of these files under
/tmp being visible to others? I don't think so. The list of processes (including terminal emulators) running, the list of terminal lines owned by a user are already visible to others.
Should we be worried about the contents of these files being readable to others? Probably. There can't be any really sensitive information in there, but maybe I wouldn't want others to know what settings I prefer. Making these files readable only by the owner (or placing them inside such a directory) makes the life of su and sudo somewhat more complicated: they'd have to copy the file. Which might not be a bad thing after all, see security.
A system administrator might grant sudo access for a user to run a particular application on someone else's behalf, as well as there could be setuid apps. For proper and fully featured behavior in a terminal emulator, the contents of termdesc would need to be accessible to that application. On the other hand, the contents of the file were possibly maliciously crafted by the user, or possibly will be modified maliciously later on.
One possible approach is to require all developers to keep this in mind and write or use a safe parser. With this approach it's actually harmful to have a simple file format: the more complex the format is, the less likely it is for anyone to come up with their own parser, there'd be just one thoroughly tested parser for everyone to use.
Such a safe parser would also need to keep in mind that the contents of the file might change while it's being parsed. Or if the contents are mmap()ped, the in-memory data can change – can it? We need to check in various Unix systems how a file being mmap()ped by a non-owner prohibits writes to that file ("Text file busy") even for the owner.
Another approach is: make su and sudo copy the file (and re-copy if the original one is modified), and in the mean time verify the file's correctness. Reject it if there's a syntax error, reject if the last byte isn't a NUL (this is a security-critical subcase of syntax error), if there's a duplicate key, if the keys are not in order, if there's an overlong key or value or if the file is too large, etc. This way the file wouldn't have to have read permissions for others.
This latter approach wouldn't work with ad-hoc setuid apps, though. How fair is it to say that setuid apps shouldn't do terminal handling (just as they shouldn't open graphical windows), they should split their code to a setuid part doing the business work requiring special permissions, and a non-setuid bit handling the user-facing bits?
There's a weird "knowledge split" with terminfo. Some features are self contained in terminfo, such as whether italic text is supported, and if so then how.
Some features are, however, partially described in terminfo, while partially require knowledge from the using application. E.g. "initc" describes the syntax, yet the application using it needs to know that the allowed values scale from 0 to 999.
Let's imagine we add support for explicit hyperlinks. Would we add the format string
\e]8;%s;%s\e\\ to termdesc? Sounds reasonable, but an application would still need to be aware of the feature's specs, know what goes to the first parameter (semantics and syntax), know what goes to the second parameter (e.g. that it needs to be URI encoded). And once this knowledge has to be present in the using application, wouldn't it be cleaner to shift all the knowledge, including the OSC 8 template format to the application, and leave only a boolean ("yes, this feature is supported") in termdesc? I don't know.
Reporting of disabled attributes
What to do with visual attributes that a user can disable or enable runtime (e.g. bold, blinking)? I think the emulator should constantly report them as supported. Otherwise an app might produce some output while the feature is disabled, then the user enables it but previously emitted text still won't be shown with this attribute.
A typical subcase of this is when an emulator only blinks text when the window is focused. It shouldn't toggle reporting blinking as supported or not supported on every focus change. Text emitted while the window was unfocused still needs to start blinking on focus in.
Some emulators dim unfocused windows. Again I don't think they should report a change of the color palette on every focus event, they should always report the focused variants. Does this make sense?
If there's a buy-in from a couple of terminal emulators and applications, and decision made on the file format and other pending details, they could begin implementing this feature. Apps could check TERMDESC and make decisions on that, if present. Otherwise, they would fall back to TERM.
In fact, it's a reasonable decision for them to use TERM for whichever purpose they already use it, and check TERMDESC only for the new ones, to replace the existing use of COLORFGBG, COLORTERM, TERM_PROGRAM and friends. Even if this new proposal gets widely adopted, I expect TERM to live on parallel to this practically forever. It's way beyond the goals of this proposal to deprecate and retire TERM.
In my opinion, convincing OpenSSH to support this new variable would be one of the most significant milestones we can hope for, as a sign of success of this idea. Tmux would probably be the second biggest one.