pdftotext sometimes moves superscript to the previous line
The pdftotext
utility sometimes moves superscript to the previous line. Consider the following text.pdf file. On one line, one has "[...] texte... 4 texte [...]". On the next line, one has "[...] que a² < A,", with the superscript "2" below the "4" (and a bit on the left).
pdftotext
generates:
√texte texte... 24 texte texte texte texte (texte texte texte
t). Texte texte texte texte texte texte A telle que a < A,
i.e. it moves the superscript "2" just before the "4".
Note that since the superscript "2" is strictly below the "4" in the PDF rendering, there shouldn't be any ambiguity in the interpretation of the text.
This is under Debian with poppler-utils 22.12.0-2.