Broken encoding when copying text from 8-bit ASCII PDF streams
Hello, some older text editors, like MiKTeX-pdfTeX 2.7.3235, according to PDF file meta, produce streams with CP1251 encoded text, and apparently fail to copy.
I shrinked one of them for reference: unc.pdf. I ran pdftotext, and what I expect to get from [<CBE8F2E5F0E0F2F3F0E0>] TJ
was Литература
, but what I actually got is UTF-8 encoded Ëèòåðàòóðà
.
I am not positive this PDF is well-formed, but I would say such files are not very uncommon in the wild.