Segmentation fault on processing pdfs from python wrapper
We working with some pdfs and poppler is working great for most of them, but for some of those pdfs we are seeing the following error.
Segmentation fault (core dumped)
After debugging further with the help of @bzamecnik we found that the error was in this line (https://gitlab.freedesktop.org/poppler/poppler/-/blame/master/poppler/TextOutputDev.cc#L396) because of accessing a NULL gfxFont
pointer, when called from https://gitlab.freedesktop.org/poppler/poppler/-/blob/master/cpp/poppler-page.cpp#L461
bool TextFontInfo::matches(const Ref *ref) const
{
return (*(gfxFont->getID()) == *ref);
}
We have fixed this issue by modifying this line to include a null check, but wanted to understand what is happening here in more detail, and whether this is expected behaviour.
from poppler import load_from_file
file_path = "sample_pdf.pdf"
pdf_document = load_from_file(file_path)
no_of_pages = pdf_document.pages
for page_ind in range(no_of_pages):
page = pdf_document.create_page(page_ind)
text_list = page.text_list(page.TextListOption.text_list_include_font)
Link to PDF: https://drive.google.com/file/d/180CDGyiJRfytvuzVsAiYKppHvaBABGkJ/view?usp=sharing