Robust terminal identification
Clients need a way to easily identify what terminal implementation it is connected to.
For various reasons (both good and bad), the TERM environment variable is often less than useful to identify the terminal that an application talks to. So it should be possible to either regenerate a more precise TERM value or use the information what terminal is used for additional feature handling in addition to terminfo.
Also sometimes it's important to work around bugs so a pure feature centric approach (which we should explore separately) does not cover everything that is needed.
Currently there is not an uniform way to identify terminals. While many implementations report their version via Primary Device Attributes (DA1) / CSI-c there is no widely supported way to query for the actual implementation that this version refers to.
I have an experiment how well terminals can be identified by fingerprinting, but in the long run that is not going to be a viable solution.
Existing sequences explicitly identifying the terminal that i know of are:
-
Tertiary Device Attributes (DA3) / CSI =c
vte uses this otherwise not very useful sequence as identification by sending '~VTE' as hex encoded answer. (xterm returns one or 8 zeros depending on version) -
OSC 702
rxvt-unicode uses this OSC to report it's name in unencoded ascii, the name of it's executable file and pieces of it's version. - Via the xterm extension to query the terminfo string for "name"/"TN"
Kitty supports this and returns xterm-kitty.
Likely there are some others too.
Very much care has to be taken when reporting anything in unencoded ascii, because there is a high risk of misinterpretation of some applications in an exploitable way, so i think we should use a mechanism that does not use that.
The xterm terminfo query feature is documented as experimental and disabled in some cases (i.e. debian, likely others). Also it's a part of a feature that is quite complex.
So i think from the existing solutions using DA3 seems the best idea. It's fairly easy to implement (needs only a CSI parser that is capable of detecting '=' private sequences), does not use unencoded ascii and is not used for anything else yet. Also the code space is big enough with 32 bits and when used with 4 encoded ascii letters it's even reasonable likely that different implementations pick distinct codes.
Thoughts? Different idea?