Paragraph dir autodetection on bigger scope
As per Eli's feedback on the Unicode mailing list (Jan, Feb '19 – there's not a concrete mail to link to):
Many text files are formatted in a way that they use a certain margin (let's say 72 characters) (using explicit newline characters, obviously), and chunks of human-perceived paragraphs (typically 3–15 lines or so) are delimited by empty lines (two consecutive newline characters). Examples include most of the well-known license files, or TUTORIAL.he as contained within Emacs's source tree, or Markdown files...
In order to have the best possible automatic behavior when cat
'ing such a file, the paragraph direction should be autodetected only once for each such emptyline-delimited segment; plus before and after shell prompts, which depends on Semantic markers for prompts.
Now, there's a huge confusion around the terminology. Such a file can of course be cat
'ed on a terminal of 60 columns. Lines of the text file don't map to lines of the terminal, they map to "paragraphs" of the terminal (as both our specification and the Unicode BiDi Algorithm defines the term "paragraph"). And similarly, freaking confusingly, "human-perceived paragraphs" (emptyline-delimited segments) of such text files don't map to "paragraphs" of the terminal and the UBA. So perhaps let's call the emptyline-delimited and prompt-delimited human-perceived paragraphs "segments".
So, for each "segment", one single "paragraph direction" would be autodetected. Then this autodetected value would be applied on all the "paragraphs" (text file's lines) of this "segment".
(In the mean time, the idea of defining a "paragraph" as the emptyline-delimited parts, and running UBA on this as a whole, an idea that I present as a possible future extension, is utterly broken, as I agrue in one of the mails on the Unicode list, and should be dropped, or let's say superseded by this new mode.)
In this new mode, the fallback paragraph direction would matter less often than in the per-paragraph autodetection mode. There'd still be cases where it makes a difference, though. Emacs then uses the previous section's direction (or LTR at the very top). Not sure if we should also do so; or maybe even say that the shell prompt is a hard break where we shouldn't look back any further, while at empty lines it's fair to go back.
Note however that we're talking about a field where there's no clear definition, and pretty much all implementations differ. E.g. the aforelinked TUTORIAL.he shows up in three different ways in Emacs, Firefox and Chromium. (The contents of the file, with regard to necessary BiDi control chars around embedded English terms, were also built up with Emacs's rendering in mind, so in that sense the file is Emacs-specific.) I don't think it can reasonably expected from such a legacy world like terminal emulation to suddenly do something better than the mixture that these other apps do.
It might make sense to say that Emacs's is the most reasonable approach, however, it's not a strict rule. And let's keep in mind that the terminal emulator has to count for many vastly different use cases, cat
'ing text files formatted in this particular way is just one of them. However, a utility outputting most of its output in one particular language resembles this use case pretty much.
With these in mind, and the yet unresolved dependency on shell prompt detection, I'm really uncertain if we should define such a mode, let alone make it the default. On the other hand, probably this mode would provide the best out-of-the-box user experience, so it sounds reasonable to make it the default.
But once we can detect the shell prompt, isn't it even better to go bigger by yet another step, and autodetect the directionality on the utility's entire output as one, not caring about empty vs. nonempty lines?
Yet another thing to ponder about is for terminal emulators to add options to their right-click menu that retrospectively alters the given paragraph's direction. So the user does a "cat file", notices its bad paragraph direction, right-clicks, picks "RTL", and it's repaired! Again something that should work on larger scope, presumably a utility's entire output, and thus depends on the semantical prompt feature.
Another implication of this new mode is that in VTE we'd practically have to switch to fullscreen repaints all the time; aiming for any smaller scope (which would mean "segments") just significantly overcomplicates everything (having to detect whenever an empty lines becomes nonempty or the other way around, or other crazy sorts of optimizations) for marginal benefits (still repainting a lot).