pdftotext should filter control characters like "form feed"
Submitted by Mike Gerber
Assigned to poppler-bugs
Description
Created attachment 129108 Example PDF
Currently, pdftotext/TextOutputDev extracts control characters like form feeds from the PDF. These should be filtered, as the users expects form feeds to be inserted by pdftotext alone.
In the attached PDF, there is a form feed character (0xC) extracted between the word "sich" and the following formula. The form feed is - AFAICT - actually a character from the CMSY10 font.
Attachment 129108, "Example PDF":
text7-page11-uncompressed.pdf