Wrong font id used when first word of a line has certain style applied (xml)
Submitted by Luis Parravicini
Assigned to poppler-bugs
Description
Created attachment 61552 Test files to reproduce the bug
When generating an xml version of a pdf, the font id used in a certain line of the text seems to be that of the first word of that line.
This creates the following bug: it the first word in a line contains a word with italics, the font id outputted for the whole line is the font of the italic word, not of the rest of the line.
I've created a file in LibreOffice (I've come accross this problem with pdf created with other programs so it's not a problem in the way LibreOffice is generating the pdf) with four lines like the following text (italic words are marked here with <i>
tags):
<i>
line</i>
3
line <i>
4</i>
Line 1
line 2
All the text has the same font/size applied. And the xml generated is:
<page number="1" position="absolute" top="0" left="0" height="1263" width="892">
<text top="85" left="85" width="46" height="20" font="0">
Line 1</text>
<text top="106" left="85" width="41" height="20" font="1">``<i>
line</i>
2</text>
<text top="126" left="85" width="40" height="20" font="0">
line 3</text>
<text top="147" left="85" width="41" height="20" font="0">
line <i>
4</i>``</text>
</page>
Attachment 61552, "Test files to reproduce the bug":
italic.zip