poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2024-03-25T21:16:18Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1477Segmentation fault on processing pdfs from python wrapper2024-03-25T21:16:18ZSamad KoitaSegmentation fault on processing pdfs from python wrapperWe working with some pdfs and poppler is working great for most of them, but for some of those pdfs we are seeing the following error.
> Segmentation fault (core dumped)
After debugging further with the help of @bzamecnik we found that...We working with some pdfs and poppler is working great for most of them, but for some of those pdfs we are seeing the following error.
> Segmentation fault (core dumped)
After debugging further with the help of @bzamecnik we found that the error was in this line (https://gitlab.freedesktop.org/poppler/poppler/-/blame/master/poppler/TextOutputDev.cc#L396) because of accessing a NULL `gfxFont` pointer, when called from https://gitlab.freedesktop.org/poppler/poppler/-/blob/master/cpp/poppler-page.cpp#L461
```
bool TextFontInfo::matches(const Ref *ref) const
{
return (*(gfxFont->getID()) == *ref);
}
```
We have fixed this issue by modifying this line to include a null check, but wanted to understand what is happening here in more detail, and whether this is expected behaviour.
```
from poppler import load_from_file
file_path = "sample_pdf.pdf"
pdf_document = load_from_file(file_path)
no_of_pages = pdf_document.pages
for page_ind in range(no_of_pages):
page = pdf_document.create_page(page_ind)
text_list = page.text_list(page.TextListOption.text_list_include_font)
```
Link to PDF: https://drive.google.com/file/d/180CDGyiJRfytvuzVsAiYKppHvaBABGkJ/view?usp=sharinghttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1476Poppler::Page::text not working correctly with RawOrderLayout2024-03-19T15:23:55ZStefanBruensPoppler::Page::text not working correctly with RawOrderLayoutI am trying to get the plain text from a document, in content order.
`Page::text(QRectF{}, Page::PhysicalLayout)` works reasonably well, and is able to extract the complete contents. For `Page::RawOrderLayout`, the results are fairly br...I am trying to get the plain text from a document, in content order.
`Page::text(QRectF{}, Page::PhysicalLayout)` works reasonably well, and is able to extract the complete contents. For `Page::RawOrderLayout`, the results are fairly broken:
- The first, trivial document returns the contents without spaces between words.
- The second, slightly more complex document does not return any text at all.
When using `pdftotext`, with `-raw`, `-layout` or "default", the content is correct.
The missing spaces are likely caused by implementation differences in TextOutputDev between `TextPage::getText` (used by `Popper::Page::text`) and `TextPage::dump` (used by pdftotext) - the latter has some code to insert spaces:
https://gitlab.freedesktop.org/poppler/poppler/-/blame/master/poppler/TextOutputDev.cc?ref_type=heads&page=6#L5391https://gitlab.freedesktop.org/poppler/poppler/-/issues/1475Searching for two words only works in single lines with some pdf files2024-03-16T15:43:24ZNelson Benítez LeónSearching for two words only works in single lines with some pdf files:arrow_double_down: **This is a copy of https://gitlab.gnome.org/GNOME/evince/-/issues/2001 with some added notes** :arrow_double_down:
### Summary
Searching for two words only works in single lines with some pdf files
### Descriptio...:arrow_double_down: **This is a copy of https://gitlab.gnome.org/GNOME/evince/-/issues/2001 with some added notes** :arrow_double_down:
### Summary
Searching for two words only works in single lines with some pdf files
### Description
I found that while searching for two (or more) words Evince will not show results where the first word is at the and of a line and the second is at the beginning at a new line.
This surely happens with files exported from LibreOffice, but these files can be correctly searched in Okular and Qoppa PDF Studio.
I attached an example pdf. Try searching in it for:
`take steps`
`refused protection`
[evince-search-sample.pdf](/uploads/042e1b9f674b9f8a4663c56e6e873f7d/evince-search-sample.pdf)
### Solution
The problem is the search code which Poppler's glib uses `TextTextPage::findText()` currently does not support matching across two lines when the second line falls in the next paragraph. And pdf files exported from Libreoffice docs with line spacing > 1.5 are interpreted by Poppler as each line being a paragraph itself (due to line spacing).
Regardless of Poppler's paragraph detecting code could be improved, an obvious fix is to make `TextTextPage::findText()` to also work from last line of a paragraph to first line of next paragraph, that's what the MR submitted does.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1474Okular / Poppler slow to fully render this single-page PDF (takes 10 seconds)2024-03-08T22:15:52ZJeff Fortin TamOkular / Poppler slow to fully render this single-page PDF (takes 10 seconds)Potentially a bit similar to #1473, but presumably less complex and maybe caused by something different… this document: [invitation_-_sample_from_PDFjs_github_issue_3809.pdf](/uploads/19e32abfbe67d07c07367712c4aab6de/invitation_-_sample_...Potentially a bit similar to #1473, but presumably less complex and maybe caused by something different… this document: [invitation_-_sample_from_PDFjs_github_issue_3809.pdf](/uploads/19e32abfbe67d07c07367712c4aab6de/invitation_-_sample_from_PDFjs_github_issue_3809.pdf) (borrowed from https://github.com/mozilla/pdf.js/issues/3809) takes 10 seconds to fully render in Okular 23.08 on Fedora 39 with Wayland.
The image and some of the text appears within roughly 6 seconds, but the rest of the text takes up to the 10 seconds mark (on my stopwatch) to render.
Evince is similarly affected (except the fact that it only displays something once fully rendered, not in realtime).
What it looks like with Sysprof 46:
| Okular 23.08 | Evince 45 |
| - | - |
| ![Sysprof_46_standalone_capture_of_Okular_rendering_German__22invitation_22_sample_-_flame_graph](/uploads/93816bc091baa7a09e1cb5386ceff38a/Sysprof_46_standalone_capture_of_Okular_rendering_German__22invitation_22_sample_-_flame_graph.png) | ![Sysprof_46_standalone_capture_of_Evince_rendering_German__22invitation_22_sample_-_flame_graph](/uploads/5fa2482f1794ba163bb9b815d0405058/Sysprof_46_standalone_capture_of_Evince_rendering_German__22invitation_22_sample_-_flame_graph.png) |https://gitlab.freedesktop.org/poppler/poppler/-/issues/1473Okular / Poppler very slow to render the 1st page of MagPi magazine issue 872024-03-08T22:06:51ZJeff Fortin TamOkular / Poppler very slow to render the 1st page of MagPi magazine issue 87[This magazine issue](https://magpi.raspberrypi.com/issues/87) has a publicly available PDF that can be directly downloaded [here](https://magpi.raspberrypi.com/issues/87/pdf/download).
For some reason, it seems the 1st page of that do...[This magazine issue](https://magpi.raspberrypi.com/issues/87) has a publicly available PDF that can be directly downloaded [here](https://magpi.raspberrypi.com/issues/87/pdf/download).
For some reason, it seems the 1st page of that document is particularly heavy, compared to the 2nd page.
With Poppler 23.08.0 on Fedora 39 on Wayland, Evince 45 and Okular 23.08 take about 15-20+ seconds to render the first page at reasonable/normally sized window sizes.
Here is the output of Sysprof 46, showing what happens when opening and loading that document on the 1st page directly:
| Okular 23.08 | Evince 45 |
| - | - |
| ![Sysprof_46_standalone_capture_of_Okular_rendering_the_1st_page_of_MagPi_magazine_issue_87_-_flame_graph](/uploads/aa182101bf7e54b81fe3f7a9dc1f3420/Sysprof_46_standalone_capture_of_Okular_rendering_the_1st_page_of_MagPi_magazine_issue_87_-_flame_graph.png) | ![Sysprof_46_standalone_capture_of_Evince_rendering_the_1st_page_of_MagPi_magazine_issue_87_-_flame_graph](/uploads/c71440197c1b55d408db7f8c788d695f/Sysprof_46_standalone_capture_of_Evince_rendering_the_1st_page_of_MagPi_magazine_issue_87_-_flame_graph.png) |
FWIW, PDF.js, while still slow, is able to render it about twice faster (corresponding issue [here](https://github.com/mozilla/pdf.js/issues/17785))https://gitlab.freedesktop.org/poppler/poppler/-/issues/1472Text search is slow in general: find_text_with_options's get_text_page calls ...2024-03-07T00:46:43ZJeff Fortin TamText search is slow in general: find_text_with_options's get_text_page calls lots of Gfx functions spamming LCMS2's CreateTransformThis is a repost/migration of a comment I originally posted on @gpoo's issue #104, as I realize now that it might be somewhat different (or not), because issue 104 was about a _specific_ document, whereas my bug report here is about slow...This is a repost/migration of a comment I originally posted on @gpoo's issue #104, as I realize now that it might be somewhat different (or not), because issue 104 was about a _specific_ document, whereas my bug report here is about slow search in general with any big document I've tried.
---
It seems as if, _somehow_, Poppler is spamming LCMS2 when doing text search operations.
## Methodology
My test case today was to run a simple word search through the infamously big [MS Office Open XML spec](https://ecma-international.org/publications-and-standards/standards/ecma-376/)'s main PDF, which you can publicly download (get the "4th edition", unzip, then unzip again the "Part 1" to find the biggest (35 MB) PDF file, which is 5000 pages). In that document, you can search for "unicode" (or "unicode character", or whatever words you fancy) in there. On my computer, for any of those queries, it takes:
* 3 minutes 50 seconds using "Papers" git;
* 2 minutes 48 seconds using Evince.
This kind of search slowness can also be seen with simpler documents such as the [ThinkPad X220's hardware maintenance manual](https://download.lenovo.com/ibmdl/pub/pc/pccbbs/mobiles_pdf/0a60739_04.pdf) (150 pages). There, it takes 15 to 25 seconds to search for a word throughout the document.
I grabbed the latest git version of "Papers" (the GTK4 fork/continuation of Evince) and Evince 45, and profiled the whole system while searching through the contents of the Office Open XML spec's PDF (the ECMA-376 document mentioned above).
Before profiling, I installed as many relevant debuginfo symbols I could find on Fedora 39, with this command:
`dnf debuginfo-install evince evince-libs poppler poppler-glib lcms2 lcms2-utils glib2 gtk4 gio`
With Sysprof 46, I recorded a few seconds of searching for the expression "unicode characters" in that 5000 pages document.
## Findings
Below are what we can see with these Sysprof 46 captures:
* [Papers__22faster-search_22_branch_-_standalone_Sysprof_capture.tar.xz](/uploads/824a75bcfa942ce128c6eefb32dc3599/Papers__22faster-search_22_branch_-_standalone_Sysprof_capture.tar.xz)
* [Evince_45_-_standalone_sysprof_capture.tar.xz](/uploads/a2b0403c207ddb3d6060fec8ba9d40ab/Evince_45_-_standalone_sysprof_capture.tar.xz)
## CPU vs IO usage
For both "Papers" and Evince, on my 8-threads CPU (an Intel Xeon W3520), only one of the logical CPUs/threads gets used (as known in #338):
| CPU usage: we can see it is single-threaded | Disk I/O: not the culprit |
| - | - |
| ![Sysprof_46_capture_-_CPU_usage.opti](/uploads/b7e23efe92bda5a5c5857fe9ab88bc02/Sysprof_46_capture_-_CPU_usage.opti.png) | ![Sysprof_46_capture_-_disk_IO.opti](/uploads/6e3f6144bc82d2b650544aa89a98f77c/Sysprof_46_capture_-_disk_IO.opti.png) |
## Function calls analysis
For the function calls analysis, I'm starting with Evince, because I have complete debug symbolization visibility for it (all the way through LCMS2), unlike "Papers":
| Most expensive / frequently called functions (callgraph) with Evince | [Flamegraph](https://brendangregg.com/flamegraphs.html) (also non-chronological, represents totals) with Evince |
| - | - |
| ![Sysprof_46_capture_of_Evince_-_callgraph](/uploads/564cab2057cc5267f2d4f4b07e081640/Sysprof_46_capture_of_Evince_-_callgraph.png) | ![Sysprof_46_capture_of_Evince_-_flamegraph_without_timeline](/uploads/93aa3df7ca0b4d43c83b386e245a6c6c/Sysprof_46_capture_of_Evince_-_flamegraph_without_timeline.png) |
| Most expensive / frequently called functions (callgraph) with "Papers" (incomplete symbols) | [Flamegraph](https://brendangregg.com/flamegraphs.html) with "Papers" (incomplete symbols) |
| - | - |
| ![Sysprof_46_capture_-_callgraph_tree.opti](/uploads/97493bfdc94ce6fbf1139b4ac301ca27/Sysprof_46_capture_-_callgraph_tree.opti.png) | ![Sysprof_46_capture_-_flamegraph__non-chronological__-_crop_without_timeline_bar.opti](/uploads/79a8c864e7e72a12853961a743da30b2/Sysprof_46_capture_-_flamegraph__non-chronological__-_crop_without_timeline_bar.opti.png) |
Even though it installed the debuginfo and debugsource packages for lcms2, with Papers' flatpaked environment I get mystery unreadable symbols for lcms2's function calls in the output, so I can't tell which exact LCMS2 functions are being called there so often, but we can presume they are the exact same functions as the ones we actually can see in Evince's poppler profiling output.
## Compositor (GNOME Shell & Mutter 45.4, under Wayland) marks
| Evince | Papers |
| - | - |
| ![Sysprof_46_capture_of_Evince_-_compositor_marks](/uploads/c55d3dac469e8037248e4f254e63bb60/Sysprof_46_capture_of_Evince_-_compositor_marks.png) | ![Sysprof_46_capture_-_compositor_marks.opti](/uploads/22ca608c3f58e0a1cbd120d65e940330/Sysprof_46_capture_-_compositor_marks.opti.png) |
---
From a layman's perspective, I don't understand why LCMS would be called by a text search function in poppler. Is that really normal?
I hope this info helps somehow!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1471White boxes surrounding elements of composed images2024-03-25T11:55:39ZMichaël BerteauxWhite boxes surrounding elements of composed imagesWhite boxes surrounding elements of composed images with Evince 45, poppler 23.08.00, and cairo 1.18.0 (Fedora Workstation 39).
Issue on Evince Issue Tracker: https://gitlab.gnome.org/GNOME/evince/-/issues/1922
File: [science_research_...White boxes surrounding elements of composed images with Evince 45, poppler 23.08.00, and cairo 1.18.0 (Fedora Workstation 39).
Issue on Evince Issue Tracker: https://gitlab.gnome.org/GNOME/evince/-/issues/1922
File: [science_research_and_innovation_performance_of_the-KI0922251ENN.pdf](/uploads/93981473336b4a463a2304e08082d2f7/science_research_and_innovation_performance_of_the-KI0922251ENN.pdf)
Results with `pdftocairo` and `pdftoppm` version 23.08.0:
**pdftocairo**
![science-cairo](/uploads/7b7cd57ab1d2a23243c0edc3d7b9bc84/science-cairo.png)
**pdftoppm**
![science-splash](/uploads/7c258e4f7191ec762382cd0e9acba6c4/science-splash.png)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1470Does Poppler have a collection of PDF files for testing purposes? Can you sha...2024-02-28T22:48:18ZyuyiDoes Poppler have a collection of PDF files for testing purposes? Can you share it for testing the popperI am currently conducting some detailed tests and hope to receive more test filesI am currently conducting some detailed tests and hope to receive more test fileshttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1469Help using text decorations like overlines2024-02-29T22:19:26ZAaron LinHelp using text decorations like overlinesIn some PDFs there is an "overline" text decoration (like an underline, but directly above the text). It seems most PDF utilities represent this as a text decoration, not just a line, so I assume Poppler has some support for it. I don't ...In some PDFs there is an "overline" text decoration (like an underline, but directly above the text). It seems most PDF utilities represent this as a text decoration, not just a line, so I assume Poppler has some support for it. I don't see this anywhere in Poppler's TextAnnotations - can someone please point me in the right direction?https://gitlab.freedesktop.org/poppler/poppler/-/issues/1468pdftotext should dehyphenate footmisc footnotes2024-03-04T15:39:20ZAlex Chalkpdftotext should dehyphenate footmisc footnotesWhen I run `pdftotext file-with-hyphenation.pdf -`, it dehyphenates the text in the main document, but not footnotes created using the package `footmisc`.
n.b. `pdftotext` does the right thing for a regular hyphenated `\footnote`. Seein...When I run `pdftotext file-with-hyphenation.pdf -`, it dehyphenates the text in the main document, but not footnotes created using the package `footmisc`.
n.b. `pdftotext` does the right thing for a regular hyphenated `\footnote`. Seeing as the rendered output of `footmisc` commands is the same (the difference is footmisc pulls the final output from a bibliography), perhaps the code used to print `\footnote` output can just be reused?https://gitlab.freedesktop.org/poppler/poppler/-/issues/1467'Mu' symbol becomes 'alpha' using poppler2024-02-20T13:15:13ZBruno Lopes'Mu' symbol becomes 'alpha' using popplerThe greek symbol 'mu' in PDF files read by poppler becomes alpha (right after `log[(` in the screenshots). This happens in many programs that depends on poppler.
See the original file: [plot2005_-_2009Jan_0.pdf](/uploads/5ef29338ccbd0bc...The greek symbol 'mu' in PDF files read by poppler becomes alpha (right after `log[(` in the screenshots). This happens in many programs that depends on poppler.
See the original file: [plot2005_-_2009Jan_0.pdf](/uploads/5ef29338ccbd0bcf6ad0e33d81815539/plot2005_-_2009Jan_0.pdf)
This was first reported in GIMP: https://gitlab.gnome.org/GNOME/gimp/-/issues/4287
In Inkscape the same bug can be see:
![image](/uploads/6026088ea4e9761adb2d647dbc4f125b/image.png)
And in Evince:
![image](/uploads/330b96eb04a9210d6e08b9b291607a47/image.png)
---
This seem to only happen in Windows but [no specific patch](https://github.com/msys2/MINGW-packages/tree/master/mingw-w64-poppler) is being applied by the MSYS2 folks. Maybe [NSS3](https://github.com/msys2/MINGW-packages/blob/c73984859915349a637686d7ceb96a27398ef2f2/mingw-w64-poppler/PKGBUILD#L80) build option?https://gitlab.freedesktop.org/poppler/poppler/-/issues/1466Build on ubuntu 22.04 with mingw fails with INT32 conflict definition2024-02-18T17:33:52ZGregor KališnikBuild on ubuntu 22.04 with mingw fails with INT32 conflict definitionHi.
I am trying to cross-compile for windows (w64) with libjpeg-v9f.
Build error:
```
[ 37%] Building CXX object CMakeFiles/poppler.dir/poppler/ImageEmbeddingUtils.cc.obj
In file included from /usr/share/mingw-w64/include/winnt.h:150,
...Hi.
I am trying to cross-compile for windows (w64) with libjpeg-v9f.
Build error:
```
[ 37%] Building CXX object CMakeFiles/poppler.dir/poppler/ImageEmbeddingUtils.cc.obj
In file included from /usr/share/mingw-w64/include/winnt.h:150,
from /usr/share/mingw-w64/include/minwindef.h:163,
from /usr/share/mingw-w64/include/windef.h:9,
from /tmp/build-windows/libs/poppler/src/poppler_external-build/poppler/poppler-config.h:133,
from /tmp/build-windows/libs/poppler/src/poppler_external/poppler/Error.h:32,
from /tmp/build-windows/libs/poppler/src/poppler_external/poppler/Object.h:45,
from /tmp/build-windows/libs/poppler/src/poppler_external/poppler/ImageEmbeddingUtils.cc:27:
/usr/share/mingw-w64/include/basetsd.h:31:22: error: conflicting declaration ‘typedef int INT32’
31 | typedef signed int INT32,*PINT32;
| ^~~~~
In file included from /usr/x86_64-w64-mingw32/include/jpeglib.h:27,
from /tmp/build-windows/libs/poppler/src/poppler_external/poppler/ImageEmbeddingUtils.cc:17:
/usr/x86_64-w64-mingw32/include/jmorecfg.h:165:14: note: previous declaration as ‘typedef long int INT32’
165 | typedef long INT32;
| ^~~~~
```
cmake config used:
```
-DCMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}
-DCMAKE_BUILD_TYPE=release
-DCMAKE_INSTALL_PREFIX=${CMAKE_CURRENT_BINARY_DIR}/${TARGET_NAME}
-DENABLE_BOOST=OFF
-DBUILD_SHARED_LIBS=OFF
-DBUILD_CPP_TESTS=OFF
-DBUILD_GTK_TESTS=OFF
-DBUILD_MANUAL_TESTS=OFF
-DBUILD_QT5_TESTS=OFF
-DBUILD_QT6_TESTS=OFF
-DENABLE_CPP=OFF
-DENABLE_QT5=OFF
-DENABLE_QT6=ON
-DENABLE_ZLIB=OFF
-DENABLE_GLIB=OFF
-DENABLE_GOBJECT_INTROSPECTION=OFF
-DENABLE_LIBCURL=OFF
-DENABLE_LIBOPENJPEG=none
-DENABLE_UTILS=OFF
-DENABLE_DCTDECODER=libjpeg
-DWITH_PNG=ON
-DWITH_TIFF=OFF
-DWITH_NSS3=OFF
```
Tried with poppler versions `21.12.00` and `24.02.00`.
By adding `#include <poppler-config.h>` to top of file `ImageEmbeddingUtils.cc` solved the build issue.
Thank you.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1465Does not show text of Apple-edited PDFs2024-02-16T18:09:56ZDorla HutchDoes not show text of Apple-edited PDFsI blackened half of the first page using PDF24 which fixed the rendering bug with the first page (text is displayed again unlike for the other pages).
SUMMARY
=======
When the PDF is opened, the hand-written annotations are visible but...I blackened half of the first page using PDF24 which fixed the rendering bug with the first page (text is displayed again unlike for the other pages).
SUMMARY
=======
When the PDF is opened, the hand-written annotations are visible but not the original PDF text (all is white).
Same happens with Firefox **but it is different from Chrome or the renderer that Dolphin uses** where all of the PDF text is visible.
STEPS TO REPRODUCE
==================
1. Annotate PDF with an apple tablet device (iPad Pro, 5th Gen, exported in GoodNotes 5)
2. Open The PDF in Okular or Firefox
OBSERVED RESULT
===============
Hand-written annotations are shown, everything else (text, lines) is no
EXPECTED RESULT
===============
Hand-written annotations are shown with everything else
Broken PDF:
===========
[2023_12_13.pdf](/uploads/9c1f29064ae072e3a6bdb399f0d2da23/2023_12_13.pdf)
[2023-12-12_not_broken.pdf](/uploads/311dba8f0d5eee766cf70bc4c64618a7/2023-12-12_not_broken.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1463pdftocairo - Problem with unembedded CID TrueType font with Identity Encoding2024-02-14T13:15:49ZHakan Usaklipdftocairo - Problem with unembedded CID TrueType font with Identity EncodingHello,
The supplied sample file displays fine in Adobe Acrobat, PDF-XChange, Foxit and many other PDF viewers.
It has unembedded CID Fonts.
The following command line on Windows 64bit using Poppler version 23.11 complains about a Syntax...Hello,
The supplied sample file displays fine in Adobe Acrobat, PDF-XChange, Foxit and many other PDF viewers.
It has unembedded CID Fonts.
The following command line on Windows 64bit using Poppler version 23.11 complains about a Syntax error in the Fonts. The created output is unusable, question marks instead of glyhs.
I am guessing the Fonts are defined in a 'Grayzone' of the PDF-Specification but is it reasonable to expect that pdftocairo/poppler library could handle these types of files to 'refry' and burn-in (embed) fonts properly as well and pull a suitable font from the systems font dir, (or C:/windows/fonts/)
```
pdftocairo.exe -pdf "d:\temp\input.pdf" "d:\temp\output_pp.pdf"
Syntax Error: Expected the optional content group list, but wasn't able to find it, or it isn't an Array
Syntax Error: non-embedded font using identity encoding: Arial
Syntax Error: non-embedded font using identity encoding: Calibri Light
Syntax Error: non-embedded font using identity encoding: Calibri
Syntax Error: non-embedded font using identity encoding: Arial,Bold
Syntax Error: non-embedded font using identity encoding: Calibri,Bold
```
Thank you and Best Regards
[input.pdf](/uploads/5eb8b5c083a215e898b76b514143fd45/input.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1462libc++-19: implicit instantiation of undefined template 'std::char_traits<uns...2024-02-05T14:35:02ZLinux Userlibc++-19: implicit instantiation of undefined template 'std::char_traits<unsigned short>'OS: Gentoo Linux amd64 musl/clang
```
$ clang --version
clang version 19.0.0git78b4e7c5+libcxx
Target: x86_64-gentoo-linux-musl
Thread model: posix
InstalledDir: /usr/lib/llvm/19/bin
Configuration file: /etc/clang/x86_64-gentoo-linux-mus...OS: Gentoo Linux amd64 musl/clang
```
$ clang --version
clang version 19.0.0git78b4e7c5+libcxx
Target: x86_64-gentoo-linux-musl
Thread model: posix
InstalledDir: /usr/lib/llvm/19/bin
Configuration file: /etc/clang/x86_64-gentoo-linux-musl-clang.cfg
```
Compiling `poppler-9999` fails with the following error:
```bash
[269/283] /usr/lib/ccache/bin/clang++ -Dpoppler_cpp_EXPORTS -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999 -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/fofi -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/goo -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/poppler -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999_build -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999_build/poppler -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999_build/cpp -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -O3 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -flto -stdlib=libc++ -Wnon-virtual-dtor -Woverloaded-virtual -std=c++17 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -MD -MT cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o -MF cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o.d -o cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o -c /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-destination.cpp
FAILED: cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o
/usr/lib/ccache/bin/clang++ -Dpoppler_cpp_EXPORTS -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999 -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/fofi -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/goo -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/poppler -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999_build -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999_build/poppler -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp -I/var/tmp/portage/app-text/poppler-9999/work/poppler-9999_build/cpp -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -O3 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -flto -stdlib=libc++ -Wnon-virtual-dtor -Woverloaded-virtual -std=c++17 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -MD -MT cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o -MF cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o.d -o cpp/CMakeFiles/poppler-cpp.dir/poppler-destination.cpp.o -c /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-destination.cpp
In file included from /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-destination.cpp:24:
In file included from /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-destination.h:25:
In file included from /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-global.h:32:
/usr/include/c++/v1/string:730:43: error: implicit instantiation of undefined template 'std::char_traits<unsigned short>'
730 | static_assert((is_same<_CharT, typename traits_type::char_type>::value),
| ^
/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-global.h:101:43: note: in instantiation of template class 'std::basic_string<unsigned short>' requested here
101 | class POPPLER_CPP_EXPORT ustring : public std::basic_string<unsigned short>
| ^
/usr/include/c++/v1/__fwd/string.h:23:29: note: template is declared here
23 | struct _LIBCPP_TEMPLATE_VIS char_traits;
| ^
In file included from /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-destination.cpp:24:
In file included from /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-destination.h:25:
In file included from /var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-global.h:32:
In file included from /usr/include/c++/v1/string:625:
/usr/include/c++/v1/string_view:296:43: error: implicit instantiation of undefined template 'std::char_traits<unsigned short>'
296 | static_assert((is_same<_CharT, typename traits_type::char_type>::value),
| ^
/usr/include/c++/v1/__type_traits/is_convertible.h:28:102: note: in instantiation of template class 'std::basic_string_view<unsigned short>' requested here
28 | struct _LIBCPP_TEMPLATE_VIS is_convertible : public integral_constant<bool, __is_convertible(_T1, _T2)> {};
| ^
/usr/include/c++/v1/string:702:29: note: in instantiation of template class 'std::is_convertible<const std::basic_string<unsigned short> &, std::basic_string_view<unsigned short>>' requested here
702 | : public _BoolConstant< is_convertible<const _Tp&, basic_string_view<_CharT, _Traits> >::value &&
| ^
/usr/include/c++/v1/string:1044:27: note: in instantiation of template class 'std::__can_be_converted_to_string_view<unsigned short, std::char_traits<unsigned short>, std::basic_string<unsigned short>>' requested here
1044 | __enable_if_t<__can_be_converted_to_string_view<_CharT, _Traits, _Tp>::value &&
| ^
/usr/include/c++/v1/string:1047:93: note: while substituting prior template arguments into non-type template parameter [with _Tp = std::basic_string<unsigned short>]
1047 | _LIBCPP_METHOD_TEMPLATE_IMPLICIT_INSTANTIATION_VIS _LIBCPP_CONSTEXPR_SINCE_CXX20 explicit basic_string(const _Tp& __t)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
1048 | : __r_(__default_init_tag(), __default_init_tag()) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1049 | __self_view __sv = __t;
| ~~~~~~~~~~~~~~~~~~~~~~~
1050 | __init(__sv.data(), __sv.size());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1051 | }
| ~
/usr/include/c++/v1/string:709:7: note: while substituting deduced template arguments into function template 'basic_string' [with _Tp = std::basic_string<unsigned short>, $1 = (no value)]
709 | class basic_string {
| ^
/var/tmp/portage/app-text/poppler-9999/work/poppler-9999/cpp/poppler-global.h:101:26: note: while declaring the implicit copy constructor for 'ustring'
101 | class POPPLER_CPP_EXPORT ustring : public std::basic_string<unsigned short>
| ^
/usr/include/c++/v1/__fwd/string.h:23:29: note: template is declared here
23 | struct _LIBCPP_TEMPLATE_VIS char_traits;
| ^
2 errors generated.
ninja: build stopped: subcommand failed.
```
The generic char_traits implementation has been deprecated in LLVM 17 and removed in https://github.com/llvm/llvm-project/commit/c3668779c13596e223c26fbd49670d18cd638c40.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1461Add option to not override files2024-01-29T22:48:29ZkenorbAdd option to not override filesCurrently when using "pdftotext file.pdf file.txt" syntax, the destination file is always overridden.
It would be great to have option to ignore the conversion if the file already exist.
Otherwise the default behaviour could be very dest...Currently when using "pdftotext file.pdf file.txt" syntax, the destination file is always overridden.
It would be great to have option to ignore the conversion if the file already exist.
Otherwise the default behaviour could be very destructive.
For example when you specify the same file as destination (by mistake), it's going to be zeroed. So there should be some safer option to work with which won't erase the existing files.
pdftotext version 22.02.0https://gitlab.freedesktop.org/poppler/poppler/-/issues/1460pdfimages should returns exit code 2 when cannot open output files2024-01-24T22:33:04ZFernando Herrerapdfimages should returns exit code 2 when cannot open output filesThis is the current behavior:
```
fer@dyckola:~$ pdfimages test-manuscript.pdf /dev/null/cannot-write-here/page-
I/O Error: Couldn't open image file '/dev/null/cannot-write-here/page--000.ppm'
fer@dyckola:~$ echo $?
0
```
But according...This is the current behavior:
```
fer@dyckola:~$ pdfimages test-manuscript.pdf /dev/null/cannot-write-here/page-
I/O Error: Couldn't open image file '/dev/null/cannot-write-here/page--000.ppm'
fer@dyckola:~$ echo $?
0
```
But according to the man page it should be 2:
```
EXIT CODES
The Xpdf tools use the following exit codes:
0 No error.
1 Error opening a PDF file.
2 Error opening an output file.
3 Error related to PDF permissions.
99 Other error.
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1459Improper "transparency knockout group" support: transparent objects where opa...2024-01-13T09:59:45ZRodrigo SeveroImproper "transparency knockout group" support: transparent objects where opaque ones expectedPDF with opaque objects are rendered transparent with poppler. Adobe Acrobar, FoxIT and Sumatra PDF renders the opaque objects properly.
[cabeceira_dagua.pdf](/uploads/fd3dd5152846eec69bc86bbb918550d8/cabeceira_dagua.pdf)
Wrong poppler...PDF with opaque objects are rendered transparent with poppler. Adobe Acrobar, FoxIT and Sumatra PDF renders the opaque objects properly.
[cabeceira_dagua.pdf](/uploads/fd3dd5152846eec69bc86bbb918550d8/cabeceira_dagua.pdf)
Wrong poppler rendering:![poppler_wrong](/uploads/28e28fd6f595ff133fed400058e43150/poppler_wrong.jpg)
Correct Sumatra PDF rendering:![sumatra_right](/uploads/7e571cd1b15c1b8492d184a02e51afaa/sumatra_right.jpg)
Search for “transparency knockout” on both (https://therion.speleo.sk/wiki/contrib:externalviewers) and (https://helpx.adobe.com/illustrator/using/transparency-blending-modes.html)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1458pdftotext: support tsv output in reading order2024-01-08T18:59:48ZFawaz Ahmedpdftotext: support tsv output in reading orderHello,
I see [tsv flag](https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/831) was added to emulate tesseract format.
Tesseract prints tsv in reading order, but the tsv output by pdftotext is not in reading order.
It wil...Hello,
I see [tsv flag](https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/831) was added to emulate tesseract format.
Tesseract prints tsv in reading order, but the tsv output by pdftotext is not in reading order.
It will be helpful if tsv follows `-layout` reading order, when `-tsv` is true.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1457Simplified Chinese display as a variant (Japanese) glyph2024-02-29T18:54:22ZFirestar-ReimuSimplified Chinese display as a variant (Japanese) glyphOriginal issue: https://bugs.kde.org/show_bug.cgi?id=461499
Wrong display: https://imgse.com/i/xXc0Qe
Correct display: https://imgse.com/i/xjJ1mD
You can see the characters: “探”、“将”、“关”
I use Okular + poppler-data
```
$ pdffonts 1.p...Original issue: https://bugs.kde.org/show_bug.cgi?id=461499
Wrong display: https://imgse.com/i/xXc0Qe
Correct display: https://imgse.com/i/xjJ1mD
You can see the characters: “探”、“将”、“关”
I use Okular + poppler-data
```
$ pdffonts 1.pdf | iconv -f gbk
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
方正书宋简体 CID TrueType GBK-EUC-H no no no 227 0
方正书宋_GBK CID TrueType GBK-EUC-H no no no 64 0
方正黑体_GBK CID TrueType GBK-EUC-H no no no 102 0
方正楷体_GBK CID TrueType GBK-EUC-H no no no 65 0
DY1+ZKWGVK-1 Type 1 Custom yes no yes 66 0
DY2+ZKWGVK-2 Type 1 Custom yes no yes 67 0
DY3+ZKWGVK-3 Type 1 Custom yes no yes 228 0
DY4+ZKWGVK-4 Type 1 Custom yes no yes 229 0
DY5+ZKWGVK-5 Type 1 Custom yes no yes 230 0
DY6+ZKWGVK-6 Type 1 Custom yes no yes 69 0
DY7+ZKWGVK-7 Type 1 Custom yes no yes 104 0
DY8+ZKWGVK-8 Type 1 Custom yes no yes 131 0
DY9+ZKWGVL-9 Type 1 Custom yes no yes 219 0
DY10+ZKWGVL-10 Type 1 Custom yes no yes 211 0
DY11+ZKWGVN-11 Type 1 Custom yes no yes 183 0
DY12+ZKWGVN-12 Type 1 Custom yes no yes 203 0
DY13+ZKWGVO-13 Type 1 Custom yes no yes 194 0
DY14+ZKWGVO-14 Type 1 Custom yes no yes 182 0
DY15+ZKWGVP-15 Type 1 Custom yes no yes 172 0
DY16+ZKWGVP-16 Type 1 Custom yes no yes 163 0
DY17+ZKWGVQ-17 Type 1 Custom yes no yes 155 0
DY18+ZKWGVR-18 Type 1 Custom yes no yes 145 0
DY19+ZKWGVS-19 Type 1 Custom yes no yes 130 0
DY20+ZKWGVT-20 Type 1 Custom yes no yes 120 0
DY21+ZKWGVT-21 Type 1 Custom yes no yes 103 0
DY22+ZKWGVT-22 Type 1 Custom yes no yes 105 0
DY23+ZKWGVT-23 Type 1 Custom yes no yes 96 0
DY24+ZKWGVT-24 Type 1 Custom yes no yes 68 0
DY25+ZKWGVT-25 Type 1 Custom yes no yes 70 0
```
PDF: https://pb.nichi.co/unveil-laptop-foil
It used Noto Sans CJK SC as a substitute
https://imgse.com/i/xXjE3F
but:
1. this is not SC (simplified Chinese) glyphs
2. I set SC higher than JP in `/etc/fonts/conf.d/64-language-selector-prefer.conf`
[PDF example](https://bugsfiles.kde.org/attachment.cgi?id=163003)