Fix HtmlFont::HtmlFilter to not lose tabs
Submitted by ulatekh
Assigned to poppler-bugs
Link to original bug (#107317)
Description
Created attachment 140749 Patch to fix bug
I'm about to use pdftohtml to extract information from PDFs and organize the results into a database, so I had a chance to dig through the code.
I've had a long-standing problem with qpdfview (which uses poppler) sometimes copying text out of PDFs incorrectly -- the text copies, but all of the spaces are missing. After reproducing it with a PDF, I tracked the problem down to the PDF using tabs where it probably should have used spaces. The patch fixes HtmlFont::HtmlFilter() to convert incoming tabs to spaces, instead of removing the whitespace completely.
There are probably other places in the code where the fix in this patch could be applied, e.g. when copying text in qpdfview.
Patch 140749, "Patch to fix bug":
0002-Fixed-HtmlFont-HtmlFilter-to-convert-tabs-to-spaces.patch