-xml does not render all images despite -c rendering correctly
Submitted by Jamie Carl
Assigned to poppler-bugs
Description
I've been trying to incorporate pdftohtml into my frontend renderer and have had some success with some documents. Other more complex documents though are having problems.
My test document is the Nikon D3s brochure:
wget http://imaging.nikon.com/products/imaging/lineup/digitalcamera/slr/d3s/pdf/d3s_16p.pdf
Rendering with the following produces a pretty accurate representation of the document:
pdftohtml -c d3s_16p.pdf
However, when I output to XML using -xml some of the images that worked previously are not output. They are not extracted or even included in the XML output.
Also, the images that are extracted are included with the wrong dimensions so the resulting page looks very out of whack.
All of the text is rendered correctly though.
Tried latest version from git with same results.