Pathological case demonstrating massive slowdown
Submitted by solo
Assigned to poppler-bugs
Link to original bug (#106135)
Description
Created attachment 138921 before
From a bug reported to pdfgrep at https://gitlab.com/pdfgrep/pdfgrep/issues/25
The original file, before.pdf, took pdfgrep only 7 seconds to search.
I then decompressed and recompressed the file to produce after.pdf. On
this new file, pdfgrep now takes 80 seconds to search it. I also tested
this procedure against some ebooks and found much worse results, such as
an increase from 4s to 250s.
It looks like this might be poppler related, since timing pdftotext on the
files also exhibits a 10x difference in performance. But every other pdf
viewer (Mac OS X Preview and Skim, mupdf, PDF.js) and parser (mutool,
podofo, pdf-parser.py, pstotext/ghostscript) I tried doesn't exhibit any
significant performance difference between these two files.
Attachment 138921, "before":
before.pdf