pdftohtml ignore png format option and extract inverted jpg images
Submitted by c1tru55
Assigned to poppler-bugs
Description
Hi all,
I use pdftohtml 0.37.0 on Ubuntu.
When I call pdftohtml -xml -fmt png command - some images are extracted as .jpg (all with inverted colors) and some as .png (all with normal colors).
When I call pdfimages -all test.pdf test command - I get same result for images (inverted .jpg and normal .png).
But when I call pdfimages -png test.pdf test command - I get only .png images and all of it has normal colors.
Questions:
- Is it possible to convert pdf to html/xml using pdftohtml utility with export all images to .png? Or at least to have non-inverted .jpg images? Because now I need to call 2 different commands for same pdf page to get correct result? It seems that
-fmt
option doesn't work - if using
pdfimages -all test.pdf test
command first image is extracted as .jpg and second as .png - does it mean that first image is actually stored in JPG format in pdf? and same for second image? - is it ok, if exported via
pdftohtml -xml
image has one resolution (width-height), but another inside generated xml? for example, file has width=145, height=145, but inside xml it has width=105, height=105?
PS: I can attach pdf file if needed
Thanks in advance,