Fix memory leak in Catalog by tracking pages (and page refs) using std::vector
Fix memory leak in Catalog by tracking pages (and page refs) using std::vector instead of manually allocating them. oss-fuzz/10119
Merge request reports
Activity
added 1 commit
- 0412f832 - Also track owned pages using unique_ptr to further avoid manual memory management.
262 250 } 263 251 264 pagesSize = getNumPages(); 265 pages = (Page **)gmallocn_checkoverflow(pagesSize, sizeof(Page *)); 266 pageRefs = (Ref *)gmallocn_checkoverflow(pagesSize, sizeof(Ref)); 267 if (pages == nullptr || pageRefs == nullptr ) { 268 error(errSyntaxError, -1, "Cannot allocate page cache"); 269 pagesSize = 0; 270 return gFalse; 271 } 272 for (int i = 0; i < pagesSize; ++i) { 273 pages[i] = nullptr; 274 pageRefs[i].num = -1; 275 pageRefs[i].gen = -1; 276 } 252 pages.resize(getNumPages()); In the case of memory exhaustion,
operator new
will throw an exception ofstd::bad_alloc
which will unwind the stack, i.e. destructors will be run and Poppler's client code can handle the exception.Note that this behavior is already present everywhere in Poppler where
operator new
is used to e.g. allocateDict
orGooString
, i.e. handling of OOM is currently not consistent and an application that really cares about OOM must handle this exception already, so this not really a behavior change from an outside point of view.(Main problem with this inconsistent handling is IMHO, that Poppler is very often not exception safe and hence resources will be leaked when the stack is unwound. But generally speaking, this should be resolved in that distant future where the code base does not contain direct usage of
new/delete
.)It is totally a behaviour change.
This code now crashes/throws an exception, while before it properly returned false.
This is why I said from an outside point of view, i.e. this particular method has a different OOM behaviour, but a user of Poppler's external API had to handle both variants of signifying OOM already in any case.
Otherwise every single use of
operator new
would have to request thenothrow
version or we would need to catch everystd::bad_alloc
exception internally instead of cleaning up our resources and bubbling up the exception to the application.I mean, I can certainly throw a
try-catch
block around the calls toresize
, but I am not sure I understand the way Poppler wants to handle OOM as a library in that case, i.e. do we- want to always handle OOM internally, ensuring the never let an exception reach the consuming application or
- just want to make sure we are exception-safe in the case of OOM and let the calling application decide what to do about it.
Admittedly, in both cases we have a lot of work to do since we do not really do either ATM. But IMHO, the second style is much more consistent with how the C++ standard library works and fits better with the observation that proper OOM (instead of bogus allocation sizes) is truly an exceptional situation due to kernel over-commitment and the OOM killer.
This is not about OOM, this is about a malformed document.
You can't create an infinitely sized Dict or GooString without actually writing infinite content on the PDF document, so there we're safe-ish.
But you can just say "Count bignumber" and we'll try to allocate bignumber, so it's very easy to request for zillions of bytes. (Well actually in this case we're protecting ourselves against the number of objects in the XRef, so it's a bit harder to trick, but that's the idea)
P.S: Exceptions don't exist for poppler, that's why we define -fno-exceptions
P.S: Exceptions don't exist for poppler, that's why we define -fno-exceptions
But then we cannot really use
std::vector
this way since e.g.resize
cannot indicate allocation failure except using exceptions. So I either I fall back to manual memory management and just fix this leak or we just stop preallocating thepages
array based on a document-supplied number but just extend as we are actually creating pages?Edited by Adam Reicholdchanged this line in version 3 of the diff
Last commit sidesteps the issue by not allocating based on what the document says, but rather allocating only those pages which we actually load which actually simplifies things. I am not sure if this will have a measurable performance impact, for if so then probably only for documents with thousands of pages at least and then other things probably dominate performance.
From how I understand the code, we always load the pages in order into the cache even if the applications asks for them out of order, i.e. we always insert the just loaded page at
pages[lastCachedPage]
and continue that untilpage <= lastCachedPage
, e.g. if applications starts by asking for3
, we load1
,2
and then3
into the cache. (AndlastCachedPage
actually ispages.size()
after that change and hence can be removed to avoid the explicit invariant.)
added 1 commit
- 3fc453a4 - Do not preallocate page cache based on document-supplied page count but just…
added 1 commit
- 802fb12e - Since the last commit implies lastCachedPage == pages.size(), remove the first…
added 23 commits
-
802fb12e...7e9b6f1d - 19 commits from branch
poppler:master
- cef9d1af - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
- a20cb985 - Also track owned pages using unique_ptr to further avoid manual memory management.
- ffb60e8b - Do not preallocate page cache based on document-supplied page count but just…
- 8e931ef6 - Since the last commit implies lastCachedPage == pages.size(), remove the first…
Toggle commit list-
802fb12e...7e9b6f1d - 19 commits from branch
added 120 commits
-
8e931ef6...89b4f3b9 - 116 commits from branch
poppler:master
- d232b3d5 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
- dfea2355 - Also track owned pages using unique_ptr to further avoid manual memory management.
- 5cb48e90 - Do not preallocate page cache based on document-supplied page count but just…
- 6cb9fe71 - Since the last commit implies lastCachedPage == pages.size(), remove the first…
Toggle commit list-
8e931ef6...89b4f3b9 - 116 commits from branch
added 8 commits
-
6cb9fe71...51adf4f5 - 4 commits from branch
poppler:master
- 3cde2662 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
- 0c4e7096 - Also track owned pages using unique_ptr to further avoid manual memory management.
- f0f6cac5 - Do not preallocate page cache based on document-supplied page count but just…
- 97c0445c - Since the last commit implies lastCachedPage == pages.size(), remove the first…
Toggle commit list-
6cb9fe71...51adf4f5 - 4 commits from branch
added 1 commit
- 5c4e246e - Try to make GCC 4.9 happy and avoid manual deletes by creating pages using std::make_unique.
@aacid The last commit hopefully makes GCC 4.9 happy as well and does remove further manual deletes, but if you do consider merging this and since these all do the same thing but are pretty incremental, I would suggest squashing them into the first commit (as the other commit messages are not particularly helpful). (I can do this as well if you want me to, but I wanted to avoid the destructive operation if you prefer to keep them separate.)
added 1 commit
- 8718a24b - Try to make GCC 4.9 happy and avoid manual deletes by creating pages using std::make_unique.
added 9 commits
-
84560c94...38b54a6a - 8 commits from branch
poppler:master
- bcfa08d0 - Fix memory leak in Catalog by tracking pages (and page refs) using std::vector…
-
84560c94...38b54a6a - 8 commits from branch