Incorrect positioning of text in PDFTOHTML
Submitted by no1ce
Assigned to poppler-bugs
Description
Created attachment 68923 pdf file inhibiting this behavior
PDFTOHTML converts text positions on certain PDF documents incorrect. Attached is a document in which this happens.
The following logic explains this further: The size of an image of the first page is 1024x1408. The text "Brief article" which can be seen highlighted should be positioned 19% from the top as seen here: http://imageshack.us/a/img526/6343/textshiftedpdf1.png
Poppler outputs this text with the following data when using pdftohtml -xml
<text top="409" left="447" width="80" height="15" font="0">
Brief article</text>
The dimensions of this page according to poppler taken from the same xml file:
<page number="1" position="absolute" top="0" left="0" height="1488" width="1063">
This would give us that the text should be according to poppler be positioned: 409/1488=0.27=27% which is clearly wrong.
No other warning messages or errors were noted when converting this document
Attachment 68923, "pdf file inhibiting this behavior":
textshifted_1.pdf