poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2024-03-07T00:46:43Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1472Text search is slow in general: find_text_with_options's get_text_page calls ...2024-03-07T00:46:43ZJeff Fortin TamText search is slow in general: find_text_with_options's get_text_page calls lots of Gfx functions spamming LCMS2's CreateTransformThis is a repost/migration of a comment I originally posted on @gpoo's issue #104, as I realize now that it might be somewhat different (or not), because issue 104 was about a _specific_ document, whereas my bug report here is about slow...This is a repost/migration of a comment I originally posted on @gpoo's issue #104, as I realize now that it might be somewhat different (or not), because issue 104 was about a _specific_ document, whereas my bug report here is about slow search in general with any big document I've tried.
---
It seems as if, _somehow_, Poppler is spamming LCMS2 when doing text search operations.
## Methodology
My test case today was to run a simple word search through the infamously big [MS Office Open XML spec](https://ecma-international.org/publications-and-standards/standards/ecma-376/)'s main PDF, which you can publicly download (get the "4th edition", unzip, then unzip again the "Part 1" to find the biggest (35 MB) PDF file, which is 5000 pages). In that document, you can search for "unicode" (or "unicode character", or whatever words you fancy) in there. On my computer, for any of those queries, it takes:
* 3 minutes 50 seconds using "Papers" git;
* 2 minutes 48 seconds using Evince.
This kind of search slowness can also be seen with simpler documents such as the [ThinkPad X220's hardware maintenance manual](https://download.lenovo.com/ibmdl/pub/pc/pccbbs/mobiles_pdf/0a60739_04.pdf) (150 pages). There, it takes 15 to 25 seconds to search for a word throughout the document.
I grabbed the latest git version of "Papers" (the GTK4 fork/continuation of Evince) and Evince 45, and profiled the whole system while searching through the contents of the Office Open XML spec's PDF (the ECMA-376 document mentioned above).
Before profiling, I installed as many relevant debuginfo symbols I could find on Fedora 39, with this command:
`dnf debuginfo-install evince evince-libs poppler poppler-glib lcms2 lcms2-utils glib2 gtk4 gio`
With Sysprof 46, I recorded a few seconds of searching for the expression "unicode characters" in that 5000 pages document.
## Findings
Below are what we can see with these Sysprof 46 captures:
* [Papers__22faster-search_22_branch_-_standalone_Sysprof_capture.tar.xz](/uploads/824a75bcfa942ce128c6eefb32dc3599/Papers__22faster-search_22_branch_-_standalone_Sysprof_capture.tar.xz)
* [Evince_45_-_standalone_sysprof_capture.tar.xz](/uploads/a2b0403c207ddb3d6060fec8ba9d40ab/Evince_45_-_standalone_sysprof_capture.tar.xz)
## CPU vs IO usage
For both "Papers" and Evince, on my 8-threads CPU (an Intel Xeon W3520), only one of the logical CPUs/threads gets used (as known in #338):
| CPU usage: we can see it is single-threaded | Disk I/O: not the culprit |
| - | - |
| ![Sysprof_46_capture_-_CPU_usage.opti](/uploads/b7e23efe92bda5a5c5857fe9ab88bc02/Sysprof_46_capture_-_CPU_usage.opti.png) | ![Sysprof_46_capture_-_disk_IO.opti](/uploads/6e3f6144bc82d2b650544aa89a98f77c/Sysprof_46_capture_-_disk_IO.opti.png) |
## Function calls analysis
For the function calls analysis, I'm starting with Evince, because I have complete debug symbolization visibility for it (all the way through LCMS2), unlike "Papers":
| Most expensive / frequently called functions (callgraph) with Evince | [Flamegraph](https://brendangregg.com/flamegraphs.html) (also non-chronological, represents totals) with Evince |
| - | - |
| ![Sysprof_46_capture_of_Evince_-_callgraph](/uploads/564cab2057cc5267f2d4f4b07e081640/Sysprof_46_capture_of_Evince_-_callgraph.png) | ![Sysprof_46_capture_of_Evince_-_flamegraph_without_timeline](/uploads/93aa3df7ca0b4d43c83b386e245a6c6c/Sysprof_46_capture_of_Evince_-_flamegraph_without_timeline.png) |
| Most expensive / frequently called functions (callgraph) with "Papers" (incomplete symbols) | [Flamegraph](https://brendangregg.com/flamegraphs.html) with "Papers" (incomplete symbols) |
| - | - |
| ![Sysprof_46_capture_-_callgraph_tree.opti](/uploads/97493bfdc94ce6fbf1139b4ac301ca27/Sysprof_46_capture_-_callgraph_tree.opti.png) | ![Sysprof_46_capture_-_flamegraph__non-chronological__-_crop_without_timeline_bar.opti](/uploads/79a8c864e7e72a12853961a743da30b2/Sysprof_46_capture_-_flamegraph__non-chronological__-_crop_without_timeline_bar.opti.png) |
Even though it installed the debuginfo and debugsource packages for lcms2, with Papers' flatpaked environment I get mystery unreadable symbols for lcms2's function calls in the output, so I can't tell which exact LCMS2 functions are being called there so often, but we can presume they are the exact same functions as the ones we actually can see in Evince's poppler profiling output.
## Compositor (GNOME Shell & Mutter 45.4, under Wayland) marks
| Evince | Papers |
| - | - |
| ![Sysprof_46_capture_of_Evince_-_compositor_marks](/uploads/c55d3dac469e8037248e4f254e63bb60/Sysprof_46_capture_of_Evince_-_compositor_marks.png) | ![Sysprof_46_capture_-_compositor_marks.opti](/uploads/22ca608c3f58e0a1cbd120d65e940330/Sysprof_46_capture_-_compositor_marks.opti.png) |
---
From a layman's perspective, I don't understand why LCMS would be called by a text search function in poppler. Is that really normal?
I hope this info helps somehow!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1471White boxes surrounding elements of composed images2024-03-25T11:55:39ZMichaël BerteauxWhite boxes surrounding elements of composed imagesWhite boxes surrounding elements of composed images with Evince 45, poppler 23.08.00, and cairo 1.18.0 (Fedora Workstation 39).
Issue on Evince Issue Tracker: https://gitlab.gnome.org/GNOME/evince/-/issues/1922
File: [science_research_...White boxes surrounding elements of composed images with Evince 45, poppler 23.08.00, and cairo 1.18.0 (Fedora Workstation 39).
Issue on Evince Issue Tracker: https://gitlab.gnome.org/GNOME/evince/-/issues/1922
File: [science_research_and_innovation_performance_of_the-KI0922251ENN.pdf](/uploads/93981473336b4a463a2304e08082d2f7/science_research_and_innovation_performance_of_the-KI0922251ENN.pdf)
Results with `pdftocairo` and `pdftoppm` version 23.08.0:
**pdftocairo**
![science-cairo](/uploads/7b7cd57ab1d2a23243c0edc3d7b9bc84/science-cairo.png)
**pdftoppm**
![science-splash](/uploads/7c258e4f7191ec762382cd0e9acba6c4/science-splash.png)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1459Improper "transparency knockout group" support: transparent objects where opa...2024-01-13T09:59:45ZRodrigo SeveroImproper "transparency knockout group" support: transparent objects where opaque ones expectedPDF with opaque objects are rendered transparent with poppler. Adobe Acrobar, FoxIT and Sumatra PDF renders the opaque objects properly.
[cabeceira_dagua.pdf](/uploads/fd3dd5152846eec69bc86bbb918550d8/cabeceira_dagua.pdf)
Wrong poppler...PDF with opaque objects are rendered transparent with poppler. Adobe Acrobar, FoxIT and Sumatra PDF renders the opaque objects properly.
[cabeceira_dagua.pdf](/uploads/fd3dd5152846eec69bc86bbb918550d8/cabeceira_dagua.pdf)
Wrong poppler rendering:![poppler_wrong](/uploads/28e28fd6f595ff133fed400058e43150/poppler_wrong.jpg)
Correct Sumatra PDF rendering:![sumatra_right](/uploads/7e571cd1b15c1b8492d184a02e51afaa/sumatra_right.jpg)
Search for “transparency knockout” on both (https://therion.speleo.sk/wiki/contrib:externalviewers) and (https://helpx.adobe.com/illustrator/using/transparency-blending-modes.html)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1449Avoiding symbol clashes2024-02-12T23:28:33ZKai PastorAvoiding symbol clashesAn extended vcpkg CI run with paraview using vtk using gdal using poppler run into linker errors due `Parser::~Parser` being defined both in Paraview and in Poppler. This is just one class name to be concerned about. There is also Array,...An extended vcpkg CI run with paraview using vtk using gdal using poppler run into linker errors due `Parser::~Parser` being defined both in Paraview and in Poppler. This is just one class name to be concerned about. There is also Array, Dict, etc. What is the preferred solution:
- Moving the classes to a namespace, e.g. `Poppler`.
- Adding a prefix similar to `GooString`, i.e. `Goo...`.
- Adjust as needed vs. moving all private classes vs. moving all classes.
This will change ABI, and it won't be entirely source compatible. (It is possible to import the classes into the global namespace using `using ...`, but at least forward declarations must be changed. Macros might provide additional help, a la QT_BEGIN_NAMESPACE.)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1419Request: Update TextOutputDev.cc (sync with xpdf)2023-08-19T06:54:03ZRafał MiłeckiRequest: Update TextOutputDev.cc (sync with xpdf)File `TextOutputDev.cc` is important for the `pdftotext` tool. Since the forking it received a lot of improvements:
https://gitlab.freedesktop.org/poppler/poppler/-/commits/master/poppler/TextOutputDev.cc
Original `TextOutputDev.cc` (...File `TextOutputDev.cc` is important for the `pdftotext` tool. Since the forking it received a lot of improvements:
https://gitlab.freedesktop.org/poppler/poppler/-/commits/master/poppler/TextOutputDev.cc
Original `TextOutputDev.cc` (in the Xpdf project) also received a lot of changes since the 3.00 release. It received support for splitting input into blocks and outputting using multiple layouts.
It seems both implementations diverged significantly and both gained some important features. Unfortunately there are some Xpdf-only features that poppler users may be missing. That includes support for `pdftotext` modes like `-simple`, `-simple2` and `-table`.
Is there any sane way of bringing Xpdf `TextOutputDev.cc` improvements into poppler?
It seems like an impossible task to port all poppler changes to the most recent Xpdf's `TextOutputDev.cc`. Too many of them.
There is no public Xpdf git repository so we can't port their commits to poppler one by one. Generating `diff` from release to release results in huge non-described bunch of changes.
Does anyone have any idea how/if this could be solved?https://gitlab.freedesktop.org/poppler/poppler/-/issues/1417Error "Syntax Warning: PDFDoc::markDictionary: Found recursive dicts" using p...2023-10-02T18:41:41Zvf1962Error "Syntax Warning: PDFDoc::markDictionary: Found recursive dicts" using pdfseparateUsing **pdfseparate 23.06.0** (from MSYS2 MinGW x64) on Windows 10 21H2
Trying to split a PDF with Microsoft documentation, only for testing purpose
The PDF is in this compressed file https://interoperability.blob.core.windows.net/files...Using **pdfseparate 23.06.0** (from MSYS2 MinGW x64) on Windows 10 21H2
Trying to split a PDF with Microsoft documentation, only for testing purpose
The PDF is in this compressed file https://interoperability.blob.core.windows.net/files/Exchange_Protocols.zip)
After the split of 3rd page
02/08/2023 13:41 245.991 test001.pdf
02/08/2023 13:41 158.641 test002.pdf
02/08/2023 13:41 155.599 test003.pdf
02/08/2023 13:41 0 test004.pdf
pdfseparate loops until I break execution repeating error message
**Syntax Warning: PDFDoc::markDictionary: Found recursive dicts**
TIAhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1377Poppler::PSConverter mismanages certain OTF fonts embedded in the document?2023-12-18T20:55:14ZSergio CallegariPoppler::PSConverter mismanages certain OTF fonts embedded in the document?Hi, I am encountering an issue with the Okular PDF viewer that might ultimately be caused by an incorrect behavior of PSConverter, so I am also posting here for feedback.
Some PDF documents using certain OTF fonts are visualized correct...Hi, I am encountering an issue with the Okular PDF viewer that might ultimately be caused by an incorrect behavior of PSConverter, so I am also posting here for feedback.
Some PDF documents using certain OTF fonts are visualized correctly by okular, but do not print well, because some characters get changed in to small squares. Looks like okular does not pass the PDF directly to the printer, but for historical reasons performs an intermediate conversion to postscript, using Poppler::PSConverter. Hence, I wonder if it might be this conversion to be troublesome and to break the fonts.
The issue is particularly frequent on documents prepared with Libreoffice, using the free Adobe fonts in the "Source" family (e.g. Source Sans 3, see https://github.com/adobe-fonts/source-sans).
Here is the link to the discussion on the okular tracker: https://bugs.kde.org/show_bug.cgi?id=467328
Here is a PDF document triggering the issue: [test.pdf](/uploads/028f3ccf3e109616bc4d9a508ec3f1b8/test.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1376Poppler-utils pdftohtml is producing two different font sizes in the same pdf...2023-03-24T20:39:14ZAnmol MalikPoppler-utils pdftohtml is producing two different font sizes in the same pdf on generating it on mac and ubuntu.Poppler-utils pdftohtml is producing two different font sizes in the same pdf on generating it on mac and ubuntu. Mainly found that it is making the font size to 15px from 14px which is generated on ubuntu. Kindly review it.
In the atta...Poppler-utils pdftohtml is producing two different font sizes in the same pdf on generating it on mac and ubuntu. Mainly found that it is making the font size to 15px from 14px which is generated on ubuntu. Kindly review it.
In the attached photos there are all the css classes produced for the same pdf in Mac as well as on ubuntu.
On Ubuntu: (sudo apt-get install poppler-utils)
![Ubuntu](/uploads/95365678e26967a39d7d01ef14d0d79b/Ubuntu.png)
On Mac: (brew install poppler)
![Mac](/uploads/5ec7cd9563a3aea5be8111b7491d09d7/Mac.png)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1370Poppler PDF Info : Long Path Issue (OS: Window 2019 Standard)2023-03-07T22:24:02ZSujith ChandPoppler PDF Info : Long Path Issue (OS: Window 2019 Standard)Hi Team,
We have been using Poppler v21.03.0 since 2020, and recently we have migrated our application to Windows 2019. Currently we are having an issues in accessing PDF files that have 260+ characters in the file path - specific to Wi...Hi Team,
We have been using Poppler v21.03.0 since 2020, and recently we have migrated our application to Windows 2019. Currently we are having an issues in accessing PDF files that have 260+ characters in the file path - specific to Windows 2019 environment.
![image](/uploads/6c92431ba3e81a154de03ab0f3e2d1a1/image.png)
Based on MS article, in order to enable the long path behavior, both of the following conditions must be met in OS 2019 Standard:
- LongPathsEnabled is set to True ( This was already enabled as part of Server build activity )
- Application manifest must also include the longPathAware element --> PDFInfo.exe specific (Nothing to do with Server Configuration)
So we have installed Poppler v23.01.0 as well, but the result is the same. So just wanted to confirm we have the following details in the application manifest file or do we have any workaround / plan to address this issue?
Could you please give me some advice?
Ref Link : https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
![image](/uploads/9f0a5de3a5ab81322de8c4fb78ddb3e4/image.png)
Note : We have a workaround - To specify such a long network path, use the "\\?\UNC\" as prefix. For example, "\\?\UNC\ServerName\Landing\..." . When using \\?\UNC\ in the file path, I was able to read the PDF using PDFinfo.exe. But need to confirm if we have something in backlog to address this issue or if there is any permanent solution for Windows 2019.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1310Appearances stream for fields with non-latin chars aren't rendered in a pdf w...2022-11-03T18:15:20ZcalixtemanAppearances stream for fields with non-latin chars aren't rendered in a pdf where /NeedAppearances is trueSTR:
- Open [multilang-form.pdf](/uploads/48dec08b05f92df0213bdfc893b39f55/multilang-form.pdf)
In evince/okular, only the field containing some latin chars is rendered:
![image](/uploads/4ce0940a552e3b32bd968d868db0b364/image.png)
Bu...STR:
- Open [multilang-form.pdf](/uploads/48dec08b05f92df0213bdfc893b39f55/multilang-form.pdf)
In evince/okular, only the field containing some latin chars is rendered:
![image](/uploads/4ce0940a552e3b32bd968d868db0b364/image.png)
But in Acrobat I get:
![image](/uploads/d2f967ead9a411a0b9dd88579e258367/image.png)
The pdf has `/NeedAppearances` set to true which means that it's up to the reader to render appearance for field elements.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1309Duplicate glyphs in SVG output2023-12-15T17:09:09ZDmitry ShubinDuplicate glyphs in SVG outputWhen running our internal tests as part of the upgrade from Popppler 21.04 to 22.10, we have noticed that some result SVG files became a bit larger. The size increase itself is not a big concern, but the underlying issue may be more impo...When running our internal tests as part of the upgrade from Popppler 21.04 to 22.10, we have noticed that some result SVG files became a bit larger. The size increase itself is not a big concern, but the underlying issue may be more important - possibly related to "Refactor CairoFontEngine caching" update.
Steps to reproduce:
- Convert the attached document [compAnno.pdf](/uploads/d4f245d1d6c8faedee445c73fe57c1c3/compAnno.pdf) to SVG, using poppler v21.04, and using v22.10
- Expected result: size of the SVG produced by v22.10 is about the same or smaller
- Actual result: size of the SVG produced by v22.10 is bigger (207 Kb vs 200 Kb)
If you look into the differences, you may notice that the SVG produced by 22.10 [compAnno.pdf.p1.22-10.svg](/uploads/000bc4d2aad4ab5f7189ecd7ea7eec1b/compAnno.pdf.p1.22-10.svg) contains more glyphs than that of 21.04 [compAnno.pdf.p1.21-04.svg](/uploads/aea33845881bb25c1ca3bdfef00fdaf4/compAnno.pdf.p1.21-04.svg): 270 vs 255. The extra glyphs in 22.10 seem to be exact duplicates of others. For example, glyph1-10 and glyph14-15 are identical.
Thank you!
Dmitryhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1291Bad rendering of data matrix code in Okular (blurry in some conditions)2022-12-28T12:54:11ZPotomacBad rendering of data matrix code in Okular (blurry in some conditions)I notice a problem with the last version of Okular (22.08.0), poppler is used by Okular with Qt5,
the rendering of data matrix image (a kind of QR code) is sometimes blurry, when the data matrix is small,
I notice this problem when buy...I notice a problem with the last version of Okular (22.08.0), poppler is used by Okular with Qt5,
the rendering of data matrix image (a kind of QR code) is sometimes blurry, when the data matrix is small,
I notice this problem when buying french PDF stamps from la poste website,
https://www.laposte.fr/mon-timbre-en-ligne
Here is a comparison between Okular and Xpdf for the same PDF, Okular doesn't manage to display a sharp rendering of the data matrix, and Xpdf doesn't have problem, the data matrix is sharp :
![bug_data_matrix](/uploads/8cc009d917b35c2e48c5c201c9b4f238/bug_data_matrix.png)
With firefox as PDF reader there is no problem.
I am not sure if poppler is the culprit, or if Okular (or Qt5 interface) doesn't use correctly poppler.
SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Arch linux
KDE Plasma Version: 5.25.4
KDE Frameworks Version: 5.97.0
Qt Version: 5.15.5https://gitlab.freedesktop.org/poppler/poppler/-/issues/1264Shows double text on selection and copy2022-07-05T18:45:01ZOnkar RuikarShows double text on selection and copyI raised this issue in evince repo: https://gitlab.gnome.org/GNOME/evince/-/issues/1816.
As per their suggestions I am raising it in poppler.
Steps to reproduce:
- Open [typescript-handbook.pdf](/uploads/0973d55d71d66594071b21b79d4921ff...I raised this issue in evince repo: https://gitlab.gnome.org/GNOME/evince/-/issues/1816.
As per their suggestions I am raising it in poppler.
Steps to reproduce:
- Open [typescript-handbook.pdf](/uploads/0973d55d71d66594071b21b79d4921ff/typescript-handbook.pdf) (downloaded from
https://www.typescriptlang.org/assets/typescript-handbook.pdf)
- Go to section '_Static type-checking_' on page 8.
- Select the error message in the code section.
- It shows double text for selection and when copied and pasted, it pastes the text twice.
> ![screenshot](https://gitlab.gnome.org/GNOME/evince/uploads/a8b4dc13780b84fb64caa084cb53eb90/XpArM6s.png)
As per the [comments](https://gitlab.gnome.org/GNOME/evince/-/issues/1816#note_1489991) the issue seems to be with poppler versions 22.05 and 22.06. And is not reproducible on v22.02.
Environment:
- Document Viewer v42.2
- Fedora 36 64-bithttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1218Issue regarding rendering of vector graphics inside pdf2022-02-24T07:35:04ZAJJLagerweijIssue regarding rendering of vector graphics inside pdfBackground
----------
I've created a pdf using Latex and the way it is rendered seems to depend on the pdf viewer that I use. I've opened [an issues](https://github.com/texstudio-org/texstudio/issues/2114) at the git of my default pdf vi...Background
----------
I've created a pdf using Latex and the way it is rendered seems to depend on the pdf viewer that I use. I've opened [an issues](https://github.com/texstudio-org/texstudio/issues/2114) at the git of my default pdf viewer. However the issue appears in multiple poppler based viewers, hence I was referred to this repository.
The issue seems to be related to anti-aliasing. I first encountered this in TeXstudio's internal viewer (which uses poppler) and saw the same issue appear in Okular (which also uses poppler). However Envice (also with poppler) seems to work fine. I'm a bit at a loss to what is causing it.
File creation workflow
----------------------
A detailed description of the file creation process. As it is unclear to me whether the issue exists in poppler (and some of the other render engines), the implementation of poppler in viewers, in Inkscape (used to create the image), or in Latex (used to create the pdf).
1. The file was created using Inkscape as an `.svg` then it was exported to a `.pdf` and `.pdf_tex` using Inkscape's build in exporter. This separates text from the drawing, the text will later be rendered in Latex and ensures consistency of fonts throughout the resulting document. SVG-effect rasterisation was turned off. [Drawing.zip](/uploads/658b7dd6195f6e1b0409eb804ba8e9db/Drawing.zip)
2. XeLaTeX was used to render the file, the process to include the figure in TeX is described in [this manual](http://tug.ctan.org/tex-archive/info/svg-inkscape/InkscapePDFLaTeX.pdf).
The resulting pdf can be found here: [Document.pdf](/uploads/d55afd9be156fc54722ab817f5e7e545/Document.pdf)
The issue in different PDF viewers
----------------------------------
Below are screenshots of how the PDF gets rendered by different viewers and render engines.
**TeXstudio** (using poppler and splash renderer) suffers from bad aliasing:
![TeXstudio-Splash](/uploads/9be15eaa5569cda4b6c857f87e148bf8/TeXstudio-Splash.png)
**TeXstudio** (using poppler and Arthur renderer) suffers from white lines in the drawing:
![TeXstudio-Arthur](/uploads/9abb9afb3029d6707a59279fb94e1519/TeXstudio-Arthur.png)
**Okular** (using poppler) looks exactly like the TeXstudio-Splash renderer does:
![Okular](/uploads/0a63705f619f284b9756593ef2214ff2/Okular.png)
**Envice/Gnome viewer** (using poppler) does not have any issues:
![Envice](/uploads/742be177c1e64f6fbe31d1be3e70fd47/Envice.png)
Now some viewers that do not rely on poppler, they are not necessarily better.
**Firefox** creates waivyness with the aliasing:
![Firefox](/uploads/ac87070f10fb2bd5af23e716cede45b0/Firefox.png)
**WPS** near perfect, but upon zooming on the screenshot it shows artefacts similar to TeXstudio-Arthur renderer but less pronounced:
![WPS](/uploads/a82aa744164fa83e95bbf93cd10dd719/WPS.png)
**MuPDF** is like the TeXstudio-Arthur renerer:
![MuPDF](/uploads/b9b979f74adeac79041cb5397d37af44/MuPDF.png)
Questions
---------
1. Is the behaviour an issue of poppler or of Inkscape? (Many viewers with different engines have issues with this pdf. However not all of them, and the issues that appear are different between the different renderers).
2. How come that the Envice viewer does not have any problems while it does use poppler. What voodoo magic do they use to make the artefacts disappear?
3. A more generic question, why do these differences even appear, what part of the image makes it behave so unpredictable. Is this kind of images not part of the official pdf specification?
4. How do I ensure that my readers are all getting a good-looking and consisting image, independent of the platform they use to view the pdf.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1214Feature request for version >=21: add option to show actual characters when s...2022-02-09T08:22:45ZHeinrich UlbrichtFeature request for version >=21: add option to show actual characters when selecting text, not just the selection boxWith poppler 0.90.0 on Fedora 33 I could see the actual characters in PDF documents when selecting text. This is an important feature for me because it allows me to check for errors in scanned and OCR'd documents.
After upgrading to Fed...With poppler 0.90.0 on Fedora 33 I could see the actual characters in PDF documents when selecting text. This is an important feature for me because it allows me to check for errors in scanned and OCR'd documents.
After upgrading to Fedora 34 I got poppler 21. Unfortunately the text selection optics in PDF viewers changed. I can no longer see the actual selected _characters_ but only the selection _rectangle_, superimposed on the background image. For my use case this gives a false sense of security because the selected characters (from the text layer) can differ from the characters shown in the background image.
The only means to get the old behavior back for me was downgrading to poppler 0.90.0.
Here's an image of the old (wanted) behavior where the characters show:
![image](/uploads/46327054f9a843a88e30e1062fd753e8/image.png)
Here's an image of the new behavior where the actual characters are not visible anymore (only the background image):
![image](/uploads/400ba212ebb1834c67f2045ba31079a1/image.png)
I propose some kind of feature flag to explicitly get the old behavior back and to show the actual characters that are selected to easily spot OCR errors.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1206pdftotext: combining accent moved to subsequent character2022-02-03T23:08:43ZZack Weinbergpdftotext: combining accent moved to subsequent characterThe attached PDF [roundtrip.pdf](/uploads/0e3174aee5a6085a1ee7bceab44558af/roundtrip.pdf) displays two grapheme clusters: **l̀a**. This is the combined rendering of three Unicode characters: U+006C, U+0300, U+0061, in that order.
pdftot...The attached PDF [roundtrip.pdf](/uploads/0e3174aee5a6085a1ee7bceab44558af/roundtrip.pdf) displays two grapheme clusters: **l̀a**. This is the combined rendering of three Unicode characters: U+006C, U+0300, U+0061, in that order.
pdftotext (self-identified as "version 22.01.0") misinterprets the U+0300 as being applied to the 'a', not the 'l', and emits **là** (U+006C U+0061 U+0300).
Looking at the page stream, I see
```
BT
/F27 9.96264 Tf
1 0 0 1 34.098 32.557 Tm [<004f>]TJ
1 0 0 1 38.96 34.789 Tm [<0b3e>]TJ
1 0 0 1 37.187 32.557 Tm [<0044>]TJ
ET
```
004f, 0b3e, 0044 are mapped to U+006C, U+0300, U+0061 respectively by the /ToUnicode object for the font. I'm guessing pdftotext is only looking at the relative x-positions of the characters to decide where the accent character is placed (38.96 > 37.17) and not where the visible glyph for the accent actually winds up. I have no idea why this particular font places the origin of this character so far to the right of the visible glyph, but TeX managed to figure out how to place the accent correctly, so I would like to think it would be possible for pdftotext to do the same.
In case it is helpful, the PDF was generated by rendering the following document with LuaTeX from TeX Live 2021:
```
\documentclass{minimal}
\usepackage[paperwidth=5cm,paperheight=2cm,margin=5mm]{geometry}
\usepackage{fontspec}
\setmainfont{Noto Serif}
\pagestyle{empty}
\begin{document}
{\`l}a
\end{document}
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1202Shape borders not (alpha) masked by cairo backend2022-02-03T07:13:01ZTrevor L DavisShape borders not (alpha) masked by cairo backend* I'm observing with the "cairo" backend that a shape's "border" is not masked by its "alpha mask" (however the shape's "fill" is masked).
* This visual bug does not occur with the "splash" backend or other pdf viewers (such as Firefox)....* I'm observing with the "cairo" backend that a shape's "border" is not masked by its "alpha mask" (however the shape's "fill" is masked).
* This visual bug does not occur with the "splash" backend or other pdf viewers (such as Firefox).
* I've attached a pdf of a rectangle with a yellow fill and a blue border alpha masked by a *holed* rectangle. With the "cairo" backend the blue border of the masked rectangle extends past its mask but the yellow fill does not. With the "splash" backend and other pdf viewers such as `firefox` neither the border nor the fill extend past its mask. This pdf was created using R's `pdf()` function (source: https://github.com/coolbutuseless/ggpattern/issues/70#issuecomment-1015560011). One of `pdf()`'s authors looked at the pdf and said the pdf output looked fine and this is probably a pdf viewer bug.
* This bug still appeared when I compiled and used `poppler-22.01.0` (although I didn't try to manually upgrade any Cairo headers on my Ubuntu 20.04 system before compiling).
[mask_bug.pdf](/uploads/99a1bfb98f38d8c30c35af51164f6bb0/mask_bug.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1186Redact (sanitize / censor / remove text) feature2022-02-27T18:17:53ZJeff Fortin TamRedact (sanitize / censor / remove text) featureHi! This is the corresponding upstream/library feature request for what I suggested downstream [in Evince](https://gitlab.gnome.org/GNOME/evince/-/issues/1716#note_1332247):
> In business/organizational settings, it is sometimes needed ...Hi! This is the corresponding upstream/library feature request for what I suggested downstream [in Evince](https://gitlab.gnome.org/GNOME/evince/-/issues/1716#note_1332247):
> In business/organizational settings, it is sometimes needed to be able to [sanitize](https://en.wikipedia.org/wiki/Sanitization_(classified_information)) sensitive documents for publication, a process typically known as redacting. Simply using black highlight instead of also removing the text would be insufficient, as viewers could then select to reveal the text behind the blackout.
Having a built-in feature to achieve this (like Adobe Acrobat allows, and probably a few other editors) while preserving the digital quality of the non-redacted text would be better than users needing to "highlight black everywhere and then rasterize the whole PDF".
Ideally, a native "redact" feature should be able to remove text with either blackouts in place, or just removing the text with no residual placeholder (i.e. whiteout).https://gitlab.freedesktop.org/poppler/poppler/-/issues/1149Incorrect height value for text elements in XML2021-11-08T10:01:39ZNikhil RankaIncorrect height value for text elements in XMLThe XML [data.xml](/uploads/96f697045140c19206ab042916e88ee8/data.xml) generated from this PDF [input.pdf](/uploads/2e69ddeb249d0b06f5dbf7f43e1e1989/input.pdf) using the command has __incorrect__ height values.
**Excerpt from PDF**:
![i...The XML [data.xml](/uploads/96f697045140c19206ab042916e88ee8/data.xml) generated from this PDF [input.pdf](/uploads/2e69ddeb249d0b06f5dbf7f43e1e1989/input.pdf) using the command has __incorrect__ height values.
**Excerpt from PDF**:
![image](/uploads/88ef638f507acb7098a76954095bdb03/image.png)
**Excerpt from XML**:
```
<fontspec id="2" size="11" family="TXPGHF+ArialMT" color="#484f5a"/>
<text top="27" left="100" width="401" height="15" font="2">This is a promotional meeting organised and funded by the Testers Alliance in-</text>
<text top="41" left="165" width="270" height="15" font="2">tended for healthcare professionals based in the UK.</text>
```
The top for `This is a promotional meeting...` is 27 with a height of 15. And the top for `tended for healthcare professionals` is 41.
But top + height 42 (27 + 15), incorrectly overlaps into the region of the next text element. In the image attached there is no overlap
Thanks!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1126Some PDF files render very slowly2023-02-15T01:43:51ZWilliam GeeSome PDF files render very slowlyI build PDF files of cave maps which contains tens of thousands of vectors. The PDF files are created with Therion, which in turn runs TeX to create PDF files. In any Poppler-based PDF viewer these files take multiple minutes to render...I build PDF files of cave maps which contains tens of thousands of vectors. The PDF files are created with Therion, which in turn runs TeX to create PDF files. In any Poppler-based PDF viewer these files take multiple minutes to render. On less power computers the same files will render in Adobe Reader in 3 to 5 seconds. Master PDF Editor also can render these files in a few seconds, though not as fast as Reader.
A sample file is attached.[StarkCaverns.pdf](/uploads/542ff6dd8e1eb3717b6f869c2ffd3bed/StarkCaverns.pdf)
I use mostly Fedora 34. Poppler is version 21.01.0
Bill Gee