nested tags are closed in the wrong order
It appears that tags are output in this order <b> <a> </b> </a>
instead of <b> <a> </a> </b>
<text top="138" left="522" width="257" height="16" font="6"><b>Chapter 8 of <a href="http://www.redbooks.ibm.com/abstracts/sg247615.html"><i>WebSphere Application</i></b></a></text>
The source file is https://www.redbooks.ibm.com/redpapers/pdfs/redp4576.pdf. Line 43 of the respective xml file contains the offensive line above.
pdftohtml -v
pdftohtml version 0.86.1 Copyright 2005-2020 The Poppler Developers - http://poppler.freedesktop.org Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch Copyright 1996-2011 Glyph & Cog, LLC
Here is the command I used to produce the xml:
pdftohtml -s -i -q -nodrm -noframes -xml