pdftotext: UTF-16 text without BOM not properly extracted
Submitted by ral..@..te.com
Assigned to poppler-bugs
Link to original bug (#103309)
Description
Created attachment 134881 Sample file
When I use pdftotext with the attached sample file I get no usable text. When looking at the file with a hex editor, I can see that the text is available as UTF-16BE without BOM. The display with xpdf is fine.
Tested with version 0.48.0 (Debian Stable) and 0.57.0 (Debian Testing).
Attachment 134881, "Sample file":
2004.pdf