poppler-cpp memory leaking on Windows
Several Windows users of the R bindings have complained about major memory leakage and unfortunately I was able to confirm the problem. The R bindings use the poppler-cpp interface and we use mingw-w64
to build on Windows.
I have compared exactly the same code on Linux, MacOS and Windows, both with poppler 0.73.0 (our current release version). Indeed, on MacOS the memory usage is stable and on Windows it rapidly increases. I have confirmed this both with GCC 8.3.0 and GCC 4.9.3 on Windows.
From some trial and error, it seems that the issue does not appear yet when loading with load_from_raw_data()
.
static document *read_raw_pdf(RawVector x, std::string opw, std::string upw, bool info_only = 0){
document *doc = document::load_from_raw_data( (const char*) x.begin(), x.length(), opw, upw);
if(!doc)
throw std::runtime_error("PDF parsing failure.");
return doc;
}
However as soon as I read something from the document such as doc->fonts()
or doc->pages()
, it seems that the document starts leaking memory.
List poppler_pdf_fonts (RawVector x, std::string opw, std::string upw) {
std::unique_ptr<poppler::document> doc(read_raw_pdf(x, opw, upw));
std::vector<font_info> fonts = doc->fonts();
...
}
Even after doc
has been delete
'd the process keeps holding on to memory. If we do this for many pdf files, we eventually run out of memory. It seems like something in the document
is not being free'd on Windows.
Is the memory allocation in poppler different on Windows than unix? What could be causing this?