pdftohtml produces wrongly nested tags

Submitted by pas..@..h.name

Assigned to poppler-bugs

Description

Created attachment 113678 Source PDF

When converting the attached PDF to XML using version 0.29.0 $ pdftohtml -xml in.pdf out.xml

It produces invalidly nested tags in the index portion near the end: $ xmllint out.xml out.xml:16770: parser error : Opening and ending tag mismatch: a line 16770 and b font="11">Abrüstung ``314, ``</a>

Looking at the source document (page 320/327) the closing  should occur before the opening <a>, the numbers are linked and not bold. Oddly the error doesn't occur for all entries.

~~Attachment 113678~~, "Source PDF":
gruene_Wahlprogramm-barrierefrei.pdf

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information