pdftohtml produces wrongly nested tags
Submitted by pas..@..h.name
Assigned to poppler-bugs
Description
Created attachment 113678 Source PDF
When converting the attached PDF to XML using version 0.29.0 $ pdftohtml -xml in.pdf out.xml
It produces invalidly nested tags in the index portion near the end:
$ xmllint out.xml
out.xml:16770: parser error : Opening and ending tag mismatch: a line 16770 and b
font="11"><b>
Abrüstung </b>``<i>
314, </i>``</a>
Looking at the source document (page 320/327) the closing </b>
should occur before the opening <a>
, the numbers are linked and not bold.
Oddly the error doesn't occur for all entries.
Attachment 113678, "Source PDF":
gruene_Wahlprogramm-barrierefrei.pdf