Pathological case demonstrating massive slowdown
Submitted by solo
Assigned to poppler-bugs
Created attachment 138921 before
From a bug reported to pdfgrep at https://gitlab.com/pdfgrep/pdfgrep/issues/25
The original file, before.pdf, took pdfgrep only 7 seconds to search. I then decompressed and recompressed the file to produce after.pdf. On this new file, pdfgrep now takes 80 seconds to search it. I also tested this procedure against some ebooks and found much worse results, such as an increase from 4s to 250s. It looks like this might be poppler related, since timing pdftotext on the files also exhibits a 10x difference in performance. But every other pdf viewer (Mac OS X Preview and Skim, mupdf, PDF.js) and parser (mutool, podofo, pdf-parser.py, pstotext/ghostscript) I tried doesn't exhibit any significant performance difference between these two files.
Attachment 138921, "before":