pdftohtml ignore png format option and extract inverted jpg images
Submitted by c1tru55
Assigned to poppler-bugs
I use pdftohtml 0.37.0 on Ubuntu.
When I call pdftohtml -xml -fmt png command - some images are extracted as .jpg (all with inverted colors) and some as .png (all with normal colors).
When I call pdfimages -all test.pdf test command - I get same result for images (inverted .jpg and normal .png).
But when I call pdfimages -png test.pdf test command - I get only .png images and all of it has normal colors.
- Is it possible to convert pdf to html/xml using pdftohtml utility with export all images to .png? Or at least to have non-inverted .jpg images? Because now I need to call 2 different commands for same pdf page to get correct result? It seems that
-fmtoption doesn't work
- if using
pdfimages -all test.pdf testcommand first image is extracted as .jpg and second as .png - does it mean that first image is actually stored in JPG format in pdf? and same for second image?
- is it ok, if exported via
pdftohtml -xmlimage has one resolution (width-height), but another inside generated xml? for example, file has width=145, height=145, but inside xml it has width=105, height=105?
PS: I can attach pdf file if needed
Thanks in advance,