Skip to content
Snippets Groups Projects

Fix memory leak in Catalog by tracking pages (and page refs) using std::vector

Merged Adam Reichold requested to merge (removed):fix-of10119 into master
1 unresolved thread

Fix memory leak in Catalog by tracking pages (and page refs) using std::vector instead of manually allocating them. oss-fuzz/10119

Merge request reports

Approval is optional

Merged by avatar (Feb 26, 2025 7:33pm UTC)

Merge details

  • Changes merged into master with bcfa08d0.
  • Deleted the source branch.

Pipeline #5525 passed

Pipeline passed for bcfa08d0 on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
262 250 }
263 251
264 pagesSize = getNumPages();
265 pages = (Page **)gmallocn_checkoverflow(pagesSize, sizeof(Page *));
266 pageRefs = (Ref *)gmallocn_checkoverflow(pagesSize, sizeof(Ref));
267 if (pages == nullptr || pageRefs == nullptr ) {
268 error(errSyntaxError, -1, "Cannot allocate page cache");
269 pagesSize = 0;
270 return gFalse;
271 }
272 for (int i = 0; i < pagesSize; ++i) {
273 pages[i] = nullptr;
274 pageRefs[i].num = -1;
275 pageRefs[i].gen = -1;
276 }
252 pages.resize(getNumPages());
  • You're removing the overflow check, what will happen now? an exception? a crash?

  • Author Contributor

    In the case of memory exhaustion, operator new will throw an exception of std::bad_alloc which will unwind the stack, i.e. destructors will be run and Poppler's client code can handle the exception.

    Note that this behavior is already present everywhere in Poppler where operator new is used to e.g. allocate Dict or GooString, i.e. handling of OOM is currently not consistent and an application that really cares about OOM must handle this exception already, so this not really a behavior change from an outside point of view.

    (Main problem with this inconsistent handling is IMHO, that Poppler is very often not exception safe and hence resources will be leaked when the stack is unwound. But generally speaking, this should be resolved in that distant future where the code base does not contain direct usage of new/delete.)

  • It is totally a behaviour change.

    This code now crashes/throws an exception, while before it properly returned false.

  • Author Contributor

    It is totally a behaviour change.

    This code now crashes/throws an exception, while before it properly returned false.

    This is why I said from an outside point of view, i.e. this particular method has a different OOM behaviour, but a user of Poppler's external API had to handle both variants of signifying OOM already in any case.

    Otherwise every single use of operator new would have to request the nothrow version or we would need to catch every std::bad_alloc exception internally instead of cleaning up our resources and bubbling up the exception to the application.

  • Author Contributor

    I mean, I can certainly throw a try-catch block around the calls to resize, but I am not sure I understand the way Poppler wants to handle OOM as a library in that case, i.e. do we

    1. want to always handle OOM internally, ensuring the never let an exception reach the consuming application or
    2. just want to make sure we are exception-safe in the case of OOM and let the calling application decide what to do about it.

    Admittedly, in both cases we have a lot of work to do since we do not really do either ATM. But IMHO, the second style is much more consistent with how the C++ standard library works and fits better with the observation that proper OOM (instead of bogus allocation sizes) is truly an exceptional situation due to kernel over-commitment and the OOM killer.

  • This is not about OOM, this is about a malformed document.

    You can't create an infinitely sized Dict or GooString without actually writing infinite content on the PDF document, so there we're safe-ish.

    But you can just say "Count bignumber" and we'll try to allocate bignumber, so it's very easy to request for zillions of bytes. (Well actually in this case we're protecting ourselves against the number of objects in the XRef, so it's a bit harder to trick, but that's the idea)

    P.S: Exceptions don't exist for poppler, that's why we define -fno-exceptions

  • Author Contributor

    P.S: Exceptions don't exist for poppler, that's why we define -fno-exceptions

    But then we cannot really use std::vector this way since e.g. resize cannot indicate allocation failure except using exceptions. So I either I fall back to manual memory management and just fix this leak or we just stop preallocating the pages array based on a document-supplied number but just extend as we are actually creating pages?

    Edited by Adam Reichold
  • changed this line in version 3 of the diff

  • Author Contributor

    Last commit sidesteps the issue by not allocating based on what the document says, but rather allocating only those pages which we actually load which actually simplifies things. I am not sure if this will have a measurable performance impact, for if so then probably only for documents with thousands of pages at least and then other things probably dominate performance.

  • Does that work if you don't ask the pages in order?

  • Author Contributor

    From how I understand the code, we always load the pages in order into the cache even if the applications asks for them out of order, i.e. we always insert the just loaded page at pages[lastCachedPage] and continue that until page <= lastCachedPage, e.g. if applications starts by asking for 3, we load 1, 2 and then 3 into the cache. (And lastCachedPage actually is pages.size() after that change and hence can be removed to avoid the explicit invariant.)

  • Please register or sign in to reply
  • Adam Reichold added 1 commit

    added 1 commit

    • 3fc453a4 - Do not preallocate page cache based on document-supplied page count but just…

    Compare with previous version

  • Adam Reichold added 1 commit

    added 1 commit

    • 802fb12e - Since the last commit implies lastCachedPage == pages.size(), remove the first…

    Compare with previous version

  • Albert Astals Cid added 23 commits

    added 23 commits

    • 802fb12e...7e9b6f1d - 19 commits from branch poppler:master
    • cef9d1af - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
    • a20cb985 - Also track owned pages using unique_ptr to further avoid manual memory management.
    • ffb60e8b - Do not preallocate page cache based on document-supplied page count but just…
    • 8e931ef6 - Since the last commit implies lastCachedPage == pages.size(), remove the first…

    Compare with previous version

  • Albert Astals Cid added 120 commits

    added 120 commits

    • 8e931ef6...89b4f3b9 - 116 commits from branch poppler:master
    • d232b3d5 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
    • dfea2355 - Also track owned pages using unique_ptr to further avoid manual memory management.
    • 5cb48e90 - Do not preallocate page cache based on document-supplied page count but just…
    • 6cb9fe71 - Since the last commit implies lastCachedPage == pages.size(), remove the first…

    Compare with previous version

  • Adam Reichold added 8 commits

    added 8 commits

    • 6cb9fe71...51adf4f5 - 4 commits from branch poppler:master
    • 3cde2662 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
    • 0c4e7096 - Also track owned pages using unique_ptr to further avoid manual memory management.
    • f0f6cac5 - Do not preallocate page cache based on document-supplied page count but just…
    • 97c0445c - Since the last commit implies lastCachedPage == pages.size(), remove the first…

    Compare with previous version

  • Adam Reichold added 1 commit

    added 1 commit

    • 5c4e246e - Try to make GCC 4.9 happy and avoid manual deletes by creating pages using std::make_unique.

    Compare with previous version

  • Author Contributor

    @aacid The last commit hopefully makes GCC 4.9 happy as well and does remove further manual deletes, but if you do consider merging this and since these all do the same thing but are pretty incremental, I would suggest squashing them into the first commit (as the other commit messages are not particularly helpful). (I can do this as well if you want me to, but I wanted to avoid the destructive operation if you prefer to keep them separate.)

  • Adam Reichold added 1 commit

    added 1 commit

    • 8718a24b - Try to make GCC 4.9 happy and avoid manual deletes by creating pages using std::make_unique.

    Compare with previous version

  • Please squash it

  • Adam Reichold added 2 commits

    added 2 commits

    • da428865 - 1 commit from branch poppler:master
    • 84560c94 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…

    Compare with previous version

  • added 9 commits

    • 84560c94...38b54a6a - 8 commits from branch poppler:master
    • bcfa08d0 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…

    Compare with previous version

  • Please register or sign in to reply
    Loading