poppler: file parsing infinite loop encountered with docs containing image masks (sample attached)
Submitted by Ed Porras
Assigned to poppler-bugs
Created attachment 77390 Sample document containing Image Mask causing poppler to get stuck in an infinite loop
We are working on an internal tool that uses poppler for PDF processing and have encountered a handful of documents that cause the poppler core to enter an infinite loop. I've looked at a couple of them and it looks to be something related to the parsing of image masks. This is happening both under linux and OS X, linked against poppler 0.22.2.
I've confirmed the bug is in poppler and not our application as it is also seen with pdftohtml. Enabling PrintCommands produces output that doesn't take long for it to show the problem:
… re 661.08 456.362 609.48 -104.88 f cs /Cs6 scn 1 1 1 gs /GS1 gfx state dict: << /SA false /SM 0.02 /Type /ExtGState >> re 0 1 1 -1 f scn 0.8 0.8 0.8 q cm 1 0 0 -1 0 1 Do /Im1 Q cs /Cs6 scn 1 1 1 gs /GS1 gfx state dict: << /SA false /SM 0.02 /Type /ExtGState >> re 0 1 1 -1 f scn 0.8 0.8 0.8 q cm 1 0 0 -1 0 1 Do /Im1 Q cs /Cs6 scn 1 1 1 gs /GS1 gfx state dict: << /SA false /SM 0.02 /Type /ExtGState >> …
If I had to guess, an offset is not getting applied resulting in the same object getting returned. I realize there is a repeated graphic on the page but by the time I killed pdftohtml (< 30s from starting it), there were around 140k instances of the PNG written to disk and I'm pretty sure that can't be right :)
I've extracted a single page of one that shows the issue and have attached it. Please note that running it on the file will quickly create thousands of small 8x8 PNGs about 100 bytes in size.
There are two similar issues reported but they date back to 2010 and are marked resolved so I'm not confident it is the same problem:
In the meantime, I'm trying to trace through the code to try and get an understanding but I'm very unfamiliar with the Parser/Lexer portion of the poppler core. Hope you can help and let me know if there's any other way I can assist.
Attachment 77390, "Sample document containing Image Mask causing poppler to get stuck in an infinite loop":