poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2022-03-15T13:31:52Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1229Please remove bit field specifier of `image::format_enum`2022-03-15T13:31:52ZkenjiunoPlease remove bit field specifier of `image::format_enum`I have confirmed that this `format` member is treated as _signed integer_ on Visual Studio 2022.
```cpp
image::format_enum format : 3;
```
Code: https://gitlab.freedesktop.org/poppler/poppler/-/blob/814fbda28cc8a37fed3134c2db8da28...I have confirmed that this `format` member is treated as _signed integer_ on Visual Studio 2022.
```cpp
image::format_enum format : 3;
```
Code: https://gitlab.freedesktop.org/poppler/poppler/-/blob/814fbda28cc8a37fed3134c2db8da28f86fb5ee0/cpp/poppler-image-private.h#L45
Setting `format_bgr24 (5)` to `format` becomes `-3`
```cpp
const image img(reinterpret_cast<char *>(data_ptr), bw, bh, d->image_format);
```
Code: https://gitlab.freedesktop.org/poppler/poppler/-/blob/814fbda28cc8a37fed3134c2db8da28f86fb5ee0/cpp/poppler-page-renderer.cpp#L292
![2022-03-15_15h32_45](/uploads/d6f1d7df1b02a408989f95dd1d430f87/2022-03-15_15h32_45.png)
![2022-03-15_15h32_32](/uploads/034a5e9b30057baa2d9d3c9dd5e8534d/2022-03-15_15h32_32.png)
```
*
0 1 2 3 4 5 6 7
-4 -3 -2 -1
```
As a result, calling of `image_private::create_data` fails on detaching `image::detach()`, because `old_d->format` is `-3`.
```cpp
d = image_private::create_data(old_d->width, old_d->height, old_d->format);
```
`image` points to released bitmap and it leads to access violation (SIGSEGV on Linux).https://gitlab.freedesktop.org/poppler/poppler/-/issues/1228PDFDoc::createTrailerDict original file's ID entry isn't an array2022-03-15T22:05:26ZmirabilosPDFDoc::createTrailerDict original file's ID entry isn't an arrayI’m getting the warning `PDFDoc::createTrailerDict original file's ID entry isn't an array` in `pdfattach` for PDF files that, legitimately¹, do not contain an ID entry at all. (They are created by Qt, in case that matters.)
Please cons...I’m getting the warning `PDFDoc::createTrailerDict original file's ID entry isn't an array` in `pdfattach` for PDF files that, legitimately¹, do not contain an ID entry at all. (They are created by Qt, in case that matters.)
Please consider just not warning if it’s absent.
① It’s optional, after all, except in some cases not present here.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1227memory leak on poppler_document_new_from_fd2022-03-28T11:57:38ZAlbert Astals Cidmemory leak on poppler_document_new_from_fdOne of the branches uses GooFile::open, that returns a GooFile that needs to be deleted later at some point, but the only thing we do with it is give it to FileStream which does not take ownership of it.One of the branches uses GooFile::open, that returns a GooFile that needs to be deleted later at some point, but the only thing we do with it is give it to FileStream which does not take ownership of it.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1226poppler installed via homebrew is crashing on monterey (m1 silicon)2022-03-12T15:12:06Zskratchdotpoppler installed via homebrew is crashing on monterey (m1 silicon)First off- great work on this library!
I know there's not a lot of info here, but I'm posting mainly to see if anyone else if having issues.
I recently started using a new Apple M1 MacBook Pro (16-inch, 2021).
I have a few scripts tha...First off- great work on this library!
I know there's not a lot of info here, but I'm posting mainly to see if anyone else if having issues.
I recently started using a new Apple M1 MacBook Pro (16-inch, 2021).
I have a few scripts that make use of poppler's pdftotext binary. It was working on my new laptop last month.
Unfortunately, I frequently run `brew upgrade` and `brew cleanup`.
It appears poppler was updated fairly recently, but I hadn't "re-run" my script until yesterday.
Now, anytime I run pdftotext, it immediately exits with some info:
```
[1] 82196 killed pdftotext
```
OSX's crash log shows some info like:
```
Process: pdftohtml [19413]
Path: /opt/homebrew/*/pdftohtml
Identifier: pdftohtml
Version: ???
Code Type: ARM-64 (Native)
Parent Process: zsh [750]
Responsible: iTerm2 [554]
```
```
Exception Type: EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
Exception Codes: UNKNOWN_0x32 at 0x0000000100fbc000
Exception Codes: 0x0000000000000032, 0x0000000100fbc000
Exception Note: EXC_CORPSE_NOTIFY
```
What's weird, is I've tried re-installing old versions of poppler, and none of the pdf tools seem to work now. I've also tried "building from source" (via homebrew). The build step does not work, but I haven't tried to debug closely.
Anyways, I'm mainly wondering if something is screwed up with my setup, or if this is an issue for other homebrew/poppler/m1/monterey users.
Thanks!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1225Possible regression introduced in pdftotext version 22.02.02022-03-07T13:34:53ZBilal DurraniPossible regression introduced in pdftotext version 22.02.0Here is the file I'm using as a test
[bigPdf.pdf](/uploads/daa8b70893b15bdc4e8f8aa2023c05ce/bigPdf.pdf)
I've also uploaded the outputs of the different versions
[bigPdf-21.02.txt](/uploads/25f915f568eac51083d3d7cf92c43f3c/bigPdf-21.02.t...Here is the file I'm using as a test
[bigPdf.pdf](/uploads/daa8b70893b15bdc4e8f8aa2023c05ce/bigPdf.pdf)
I've also uploaded the outputs of the different versions
[bigPdf-21.02.txt](/uploads/25f915f568eac51083d3d7cf92c43f3c/bigPdf-21.02.txt)
[bigPdf-22.02.txt](/uploads/acc96ad146be09420d9d8c9e44385d3b/bigPdf-22.02.txt)
As an example, this was a section of the text in v21
```
CRUSTACÉS DÉCAPODES NOUVEAUX OU PEU CONNUS
DE L'ÉPOQUE CRÉTACIQUE?
par Victor VAN STRAELEN (Bruxelles).
```
The following is the result in v22
```
CRUSTACÉS DÉCAPODES NOUVEAUX OU PEU CONNUS
DE L'ÉPOQUE CRÉTACIQUE?
par Victor V A N S T R A E L E N (Bruxelles).
```
Sorry I wasn't able to narrow down the versions, but i'm using poppler with `brew` and I am having trouble figuring out how to downgrade poppler.
I hope this helps find the issue
Tested on the Mac Os 12.2.1.
Thanks for your helphttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1224raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError...2022-04-05T07:12:02Zภัทราวุฒิ ถิ่นนารามraise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?I can't fix something.I can't fix something.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1223Changes in /usr/include/PDFDoc.h cause compile failure for inkscape2022-03-04T05:09:00ZSeemant KulleenChanges in /usr/include/PDFDoc.h cause compile failure for inkscapeHello, inkscape (v1.1.1 and v1.1.2 both) compiles against poppler-22.02.0 but errors out against poppler-22.03.0 as follows ((in 310/918):
`var/tmp/portage/media-gfx/inkscape-1.1.2/work/inkscape-1.1.2/src/extension/internal/pdfinput/pdf...Hello, inkscape (v1.1.1 and v1.1.2 both) compiles against poppler-22.02.0 but errors out against poppler-22.03.0 as follows ((in 310/918):
`var/tmp/portage/media-gfx/inkscape-1.1.2/work/inkscape-1.1.2/src/extension/internal/pdfinput/pdf-input.cpp:672:39: required from here
/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include/g++-v11.2.0/ext/new_allocator.h:162:11: error: no matching function for call to 'PDFDoc::PDFDoc(GooString*&, std::nullptr_t, std::nullptr_t, std::nullptr_t)'
162 | { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /var/tmp/portage/media-gfx/inkscape-1.1.2/work/inkscape-1.1.2/src/extension/internal/pdfinput/pdf-input.cpp:25:
/usr/include/poppler/PDFDoc.h:371:5: note: candidate: 'PDFDoc::PDFDoc()'
371 | PDFDoc();
| ^~~~~~
/usr/include/poppler/PDFDoc.h:371:5: note: candidate expects 0 arguments, 4 provided
/usr/include/poppler/PDFDoc.h:139:14: note: candidate: 'PDFDoc::PDFDoc(BaseStream*, const std::optional<GooString>&, const std::optional<GooString>&, void*, const std::function<void()>&)'
139 | explicit PDFDoc(BaseStream *strA, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});
| ^~~~~~
/usr/include/poppler/PDFDoc.h:139:33: note: no known conversion for argument 1 from 'GooString*' to 'BaseStream*'
139 | explicit PDFDoc(BaseStream *strA, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});
| ~~~~~~~~~~~~^~~~
In file included from /var/tmp/portage/media-gfx/inkscape-1.1.2/work/inkscape-1.1.2/src/extension/internal/pdfinput/pdf-input.cpp:25:
/usr/include/poppler/PDFDoc.h:132:14: note: candidate: 'PDFDoc::PDFDoc(std::unique_ptr<GooString>&&, const std::optional<GooString>&, const std::optional<GooString>&, void*, const std::function<void()>&)'
132 | explicit PDFDoc(std::unique_ptr<GooString> &&fileNameA, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr,
| ^~~~~~
/usr/include/poppler/PDFDoc.h:132:50: note: no known conversion for argument 1 from 'GooString*' to 'std::unique_ptr<GooString>&&'
132 | explicit PDFDoc(std::unique_ptr<GooString> &&fileNameA, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
`
seems the changes in the PDFDoc.h header file are as follows:
`
- explicit PDFDoc(const GooString *fileNameA, const GooString *ownerPassword = nullptr, const GooString *userPassword = nullptr, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});
+ explicit PDFDoc(std::unique_ptr<GooString> &&fileNameA, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr,
+ const std::function<void()> &xrefReconstructedCallback = {});
#ifdef _WIN32
- PDFDoc(wchar_t *fileNameA, int fileNameLen, GooString *ownerPassword = nullptr, GooString *userPassword = nullptr, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});
+ PDFDoc(wchar_t *fileNameA, int fileNameLen, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});
#endif
- explicit PDFDoc(BaseStream *strA, const GooString *ownerPassword = nullptr, const GooString *userPassword = nullptr, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});
+ explicit PDFDoc(BaseStream *strA, const std::optional<GooString> &ownerPassword = {}, const std::optional<GooString> &userPassword = {}, void *guiDataA = nullptr, const std::function<void()> &xrefReconstructedCallback = {});`https://gitlab.freedesktop.org/poppler/poppler/-/issues/1222pdf to html strange characters2022-02-28T08:51:53ZNicopdf to html strange charactersFor many pdfs the conversion to html/xml works fine. But sometimes I get pdf files which, when converted, show only non readable and strange looking text like:
`fPkFKFbrFlgPCPHFPbkNPQbkimiKBbmšGGNHLbPBkhKFlgFPQbQFKb‚iDbOCPQFPbHFšgGBFPb...For many pdfs the conversion to html/xml works fine. But sometimes I get pdf files which, when converted, show only non readable and strange looking text like:
`fPkFKFbrFlgPCPHFPbkNPQbkimiKBbmšGGNHLbPBkhKFlgFPQbQFKb‚iDbOCPQFPbHFšgGBFPb›AgGCPHkAKBbNKQ
MjiPBibHFšgKBpbkiFNBbPNlgBbQFKba€cCHb‚iPbMjiPBibHAPcbiQFKbBFNGFNkFbACkHFklgGikkFPbNkBLbEFN
ˆ€FKklgKFNBCPHbQFkbHFšgKBFPb›AgGCPHkcNFGFkbNKQbQFKbNPba€cCHbHF€KAlgBFbMjiPBibcCKoljHFKFlgPFBL`
The pdf file has no password and when I use cairo to convert to image, it works just fine.
Did anyone ever see output like this?
Sincerely
Nicohttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1221pdftotext hyphenation over page break2022-12-04T00:32:25Za bpdftotext hyphenation over page breakIn rare circumstances hyphenation goes over pages. Is there any way to not only disable pagebreak (-nopgbrk) but also remove hyphens _after_ that?
Best regards!In rare circumstances hyphenation goes over pages. Is there any way to not only disable pagebreak (-nopgbrk) but also remove hyphens _after_ that?
Best regards!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1220PDF to PNG with parallel processes running, mixed up output2022-02-16T02:30:20ZJon BPDF to PNG with parallel processes running, mixed up outputHello! I'm running into an issue with pdftocairo where if I run two pdftocairo processes at once on the same machine the outputted png files become intertwined and mixed up.
For example:
If I run this at same time:
```
pdftocairo -pn...Hello! I'm running into an issue with pdftocairo where if I run two pdftocairo processes at once on the same machine the outputted png files become intertwined and mixed up.
For example:
If I run this at same time:
```
pdftocairo -png large-file.pdf outputA
pdftocairo -png another-large-file.pdf outputB
```
I sometimes get some pages from `large-file.pdf` in `outputB`.
Is this expected and are then any recommendations on how to avoid this?
Thank you!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1219allow using header from C code2022-02-15T18:20:23ZThomas Klausnerallow using header from C codepoppler/poppler-config.h.cmake unconditionally includes &lt;cstdio&gt;. It would be good to make this file usable from C too.poppler/poppler-config.h.cmake unconditionally includes <cstdio>. It would be good to make this file usable from C too.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1218Issue regarding rendering of vector graphics inside pdf2022-02-24T07:35:04ZAJJLagerweijIssue regarding rendering of vector graphics inside pdfBackground
----------
I've created a pdf using Latex and the way it is rendered seems to depend on the pdf viewer that I use. I've opened [an issues](https://github.com/texstudio-org/texstudio/issues/2114) at the git of my default pdf vi...Background
----------
I've created a pdf using Latex and the way it is rendered seems to depend on the pdf viewer that I use. I've opened [an issues](https://github.com/texstudio-org/texstudio/issues/2114) at the git of my default pdf viewer. However the issue appears in multiple poppler based viewers, hence I was referred to this repository.
The issue seems to be related to anti-aliasing. I first encountered this in TeXstudio's internal viewer (which uses poppler) and saw the same issue appear in Okular (which also uses poppler). However Envice (also with poppler) seems to work fine. I'm a bit at a loss to what is causing it.
File creation workflow
----------------------
A detailed description of the file creation process. As it is unclear to me whether the issue exists in poppler (and some of the other render engines), the implementation of poppler in viewers, in Inkscape (used to create the image), or in Latex (used to create the pdf).
1. The file was created using Inkscape as an `.svg` then it was exported to a `.pdf` and `.pdf_tex` using Inkscape's build in exporter. This separates text from the drawing, the text will later be rendered in Latex and ensures consistency of fonts throughout the resulting document. SVG-effect rasterisation was turned off. [Drawing.zip](/uploads/658b7dd6195f6e1b0409eb804ba8e9db/Drawing.zip)
2. XeLaTeX was used to render the file, the process to include the figure in TeX is described in [this manual](http://tug.ctan.org/tex-archive/info/svg-inkscape/InkscapePDFLaTeX.pdf).
The resulting pdf can be found here: [Document.pdf](/uploads/d55afd9be156fc54722ab817f5e7e545/Document.pdf)
The issue in different PDF viewers
----------------------------------
Below are screenshots of how the PDF gets rendered by different viewers and render engines.
**TeXstudio** (using poppler and splash renderer) suffers from bad aliasing:
![TeXstudio-Splash](/uploads/9be15eaa5569cda4b6c857f87e148bf8/TeXstudio-Splash.png)
**TeXstudio** (using poppler and Arthur renderer) suffers from white lines in the drawing:
![TeXstudio-Arthur](/uploads/9abb9afb3029d6707a59279fb94e1519/TeXstudio-Arthur.png)
**Okular** (using poppler) looks exactly like the TeXstudio-Splash renderer does:
![Okular](/uploads/0a63705f619f284b9756593ef2214ff2/Okular.png)
**Envice/Gnome viewer** (using poppler) does not have any issues:
![Envice](/uploads/742be177c1e64f6fbe31d1be3e70fd47/Envice.png)
Now some viewers that do not rely on poppler, they are not necessarily better.
**Firefox** creates waivyness with the aliasing:
![Firefox](/uploads/ac87070f10fb2bd5af23e716cede45b0/Firefox.png)
**WPS** near perfect, but upon zooming on the screenshot it shows artefacts similar to TeXstudio-Arthur renderer but less pronounced:
![WPS](/uploads/a82aa744164fa83e95bbf93cd10dd719/WPS.png)
**MuPDF** is like the TeXstudio-Arthur renerer:
![MuPDF](/uploads/b9b979f74adeac79041cb5397d37af44/MuPDF.png)
Questions
---------
1. Is the behaviour an issue of poppler or of Inkscape? (Many viewers with different engines have issues with this pdf. However not all of them, and the issues that appear are different between the different renderers).
2. How come that the Envice viewer does not have any problems while it does use poppler. What voodoo magic do they use to make the artefacts disappear?
3. A more generic question, why do these differences even appear, what part of the image makes it behave so unpredictable. Is this kind of images not part of the official pdf specification?
4. How do I ensure that my readers are all getting a good-looking and consisting image, independent of the platform they use to view the pdf.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1217Double free in gfree()2022-02-14T09:33:30ZcrtDouble free in gfree()```
#0 0x00007fce1db6f438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fce1db7103a in __GI_abort () at abort.c:89
#2 0x00007fce1dbb17fa in __libc_message (do_abort=do_abort@entry=2,
fmt=fmt@en...```
#0 0x00007fce1db6f438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fce1db7103a in __GI_abort () at abort.c:89
#2 0x00007fce1dbb17fa in __libc_message (do_abort=do_abort@entry=2,
fmt=fmt@entry=0x7fce1dccafd8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007fce1dbba38a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>,
str=0x7fce1dccb108 "double free or corruption (!prev)", action=3) at malloc.c:5020
#4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3874
#5 0x00007fce1dbbe58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975
#6 0x00007fce1ed20eee in gfree (p=0xb941) at /test/goo/gmem.h:63
#7 ImageStream::~ImageStream (this=0x2292980) at /test/poppler/Stream.cc:583
#8 0x00007fce1ede4788 in SplashOutputDev::drawImage (this=<optimized out>, state=<optimized out>,
ref=<optimized out>, str=0x2292fa0, width=<optimized out>, height=<optimized out>,
colorMap=<optimized out>, interpolate=<optimized out>, maskColors=<optimized out>,
inlineImg=<optimized out>) at /test/poppler/SplashOutputDev.cc:3364
#9 0x00007fce1ec0c41d in Gfx::doImage (this=0x2286bd0, ref=0x7ffe96deb860, str=0x2292fa0,
inlineImg=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
at /test/poppler/Gfx.cc:4520
#10 0x00007fce1ebdc5ca in Gfx::opXObject (this=0x2286bd0, args=<optimized out>, numArgs=<optimized out>)
at /test/poppler/Gfx.cc:4097
#11 0x00007fce1ebf0cf2 in Gfx::execOp (this=0x2286bd0, cmd=<optimized out>, args=0x7ffe96deb9c0,
numArgs=<optimized out>) at /test/poppler/Gfx.cc:802
#12 0x00007fce1ebef2ef in Gfx::go (this=0x2286bd0,
topLevel=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
at /test/poppler/Gfx.cc:679
#13 0x00007fce1ebee9d9 in Gfx::display (this=<optimized out>, obj=0x7ffe96debcc0,
topLevel=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
at /test/poppler/Gfx.cc:640
```
[poc](/uploads/5095160fd23528588a0ce2e534b0c211/poc)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1216Segmentation fault in str_fill_input_buffer()2022-02-14T09:34:32ZcrtSegmentation fault in str_fill_input_buffer()```
Program received signal SIGSEGV, Segmentation fault.
0x00007f8a978f3015 in str_fill_input_buffer (cinfo=<optimized out>)
at /test/poppler/DCTStream.cc:33
33 c = src->str->getChar();
(gdb) bt
#0 0x00007f8a978f3015 in str_...```
Program received signal SIGSEGV, Segmentation fault.
0x00007f8a978f3015 in str_fill_input_buffer (cinfo=<optimized out>)
at /test/poppler/DCTStream.cc:33
33 c = src->str->getChar();
(gdb) bt
#0 0x00007f8a978f3015 in str_fill_input_buffer (cinfo=<optimized out>)
at /test/poppler/DCTStream.cc:33
#1 0x00007f8a95ecfbea in jpeg_fill_bit_buffer () from /usr/lib/x86_64-linux-gnu/libjpeg.so.8
#2 0x00007f8a95ed0777 in ?? () from /usr/lib/x86_64-linux-gnu/libjpeg.so.8
#3 0x00007f8a95ecc166 in ?? () from /usr/lib/x86_64-linux-gnu/libjpeg.so.8
#4 0x00007f8a95ed1dd6 in ?? () from /usr/lib/x86_64-linux-gnu/libjpeg.so.8
#5 0x00007f8a95ecb27a in jpeg_read_scanlines () from /usr/lib/x86_64-linux-gnu/libjpeg.so.8
#6 0x00007f8a978f38d1 in DCTStream::readLine (this=0xf9add0)
at /test/poppler/DCTStream.cc:190
#7 0x00007f8a978f3b37 in DCTStream::getChars (this=<optimized out>, nChars=900,
buffer=0xf9a570 '\377' <repeats 200 times>...)
at /test/poppler/DCTStream.cc:216
#8 0x00007f8a977e010b in Stream::doGetChars (this=0xf9add0, nChars=900,
buffer=0xf9a570 '\377' <repeats 200 times>...)
at /test/poppler/Stream.h:130
```
[poc](/uploads/faa38448518853cfa8a1a4ca9e830235/poc)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1215height of underline is inconsistent2022-03-06T16:06:14ZANaumann85height of underline is inconsistentThe attached file [highlightWithTwoQuads.pdf](/uploads/b6078853f9b419809477ccb3e12f0679/highlightWithTwoQuads.pdf)
contains two underlined lines of text.
This line is stored as one annotation, which contains two quads. Both quads have ...The attached file [highlightWithTwoQuads.pdf](/uploads/b6078853f9b419809477ccb3e12f0679/highlightWithTwoQuads.pdf)
contains two underlined lines of text.
This line is stored as one annotation, which contains two quads. Both quads have the same hight (approx. 0.011535 units).
Nevertheless the upper line is displayed thicker then the lower one.
This inconsistency does not happen using chrome, see the ![comparison](/uploads/d9930440745edec9c81919d405afba53/highlightTest.png) showing chrome at the right and okular at the left.
I also observed the same problem with pdftoppm, but the exported image is a approx 6MB in size. Thus I did not upload it.
Software versions:
* Poppler version: 20.09.0-3.1
* Distribution: debian 11https://gitlab.freedesktop.org/poppler/poppler/-/issues/1214Feature request for version >=21: add option to show actual characters when s...2022-02-09T08:22:45ZHeinrich UlbrichtFeature request for version >=21: add option to show actual characters when selecting text, not just the selection boxWith poppler 0.90.0 on Fedora 33 I could see the actual characters in PDF documents when selecting text. This is an important feature for me because it allows me to check for errors in scanned and OCR'd documents.
After upgrading to Fed...With poppler 0.90.0 on Fedora 33 I could see the actual characters in PDF documents when selecting text. This is an important feature for me because it allows me to check for errors in scanned and OCR'd documents.
After upgrading to Fedora 34 I got poppler 21. Unfortunately the text selection optics in PDF viewers changed. I can no longer see the actual selected _characters_ but only the selection _rectangle_, superimposed on the background image. For my use case this gives a false sense of security because the selected characters (from the text layer) can differ from the characters shown in the background image.
The only means to get the old behavior back for me was downgrading to poppler 0.90.0.
Here's an image of the old (wanted) behavior where the characters show:
![image](/uploads/46327054f9a843a88e30e1062fd753e8/image.png)
Here's an image of the new behavior where the actual characters are not visible anymore (only the background image):
![image](/uploads/400ba212ebb1834c67f2045ba31079a1/image.png)
I propose some kind of feature flag to explicitly get the old behavior back and to show the actual characters that are selected to easily spot OCR errors.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1213pdftoppm image scaling issue2022-02-16T04:38:21ZNikhil Rankapdftoppm image scaling issueWhen attempting to generate an image from a PDF [PHT19.pdf](/uploads/ea6a6bc0e8d73626b17f4a6a17e7c0bf/PHT19.pdf) with specific scale constraints, the generated image is off by a few pixels.
**pdftoppm command**
`pdftoppm -scale-to-x 19...When attempting to generate an image from a PDF [PHT19.pdf](/uploads/ea6a6bc0e8d73626b17f4a6a17e7c0bf/PHT19.pdf) with specific scale constraints, the generated image is off by a few pixels.
**pdftoppm command**
`pdftoppm -scale-to-x 198 -scale-to-y 99 -r 250 -jpeg PHT19.pdf $(pwd)/fullImage.jpg`
The width of the image is 199 and NOT 198 as required.
Any ideas on how to resolve this?
**Device Info**
- pdftoppm version:
```
dev@ip:$ pdftoppm -v
pdftoppm version 0.86.1
Copyright 2005-2020 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
```
- OS version
```
dev@ip$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1212line caps incorrectly rendered as rounded2022-02-27T23:48:02Zfkv1line caps incorrectly rendered as roundedButt caps on line ends are sometimes rendered as rounded caps.
Forwarded from: https://bugs.kde.org/show_bug.cgi?id=449808
See there for more details and example.Butt caps on line ends are sometimes rendered as rounded caps.
Forwarded from: https://bugs.kde.org/show_bug.cgi?id=449808
See there for more details and example.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1211"pdfimages -list" gets confused with nested images2022-02-16T05:02:59ZR. Diez"pdfimages -list" gets confused with nested imagesI am using pdfimages 0.86.1 that comes with Ubuntu 20.04.3.
I have a PDF file (unfortunately with copyrighted contents) that has been produced by a commercial scanning and OCR software solution.
Each scanned page yields a PDF page with...I am using pdfimages 0.86.1 that comes with Ubuntu 20.04.3.
I have a PDF file (unfortunately with copyrighted contents) that has been produced by a commercial scanning and OCR software solution.
Each scanned page yields a PDF page with 3 pictures. It seems that the software identifies the text areas and saves them as a single high-resolution monochrome image. This image is missing those areas identified as not text, for example, drawings or colour pictures. Those missing areas land in 2 separate pictures.
The result is a very small PDF file size which can be very accurately OCR'ed. A have seen another commercial OCR software that does a similar thing.
By the way, does this method of breaking up a scanned picture for OCR and space-saving purposes have a name? I couldn't find any open-source tool like OCRmyPDF or Ghostscript that is able to do that separation. Normally, you get one picture per page, so you do not manage to shrink the PDFs so much.
When you view the scanned PDF, you do not realise that there are 3 pictures per page. I guess the images are transparent and stacked on top of each other.
Command "pdfimages -list" shows:
```
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 827 1169 gray 1 8 jpeg no 4 0 101 100 43.2K 4.6%
1 1 image 827 1169 gray 1 8 jpeg no 6 0 101 100 11.0K 1.2%
1 2 mask 2481 3508 - 1 1 jpeg no 6 0 301 300 11.0K 1.0%
[... entries for other PDF pages ...]
```
Note that "object" value 6 is duplicated, which I think it should not be.
If you extract with "pdfimages -all", you get:
```
51.766 -000.jpg
19.231 -001.jpg
30.995 -002.ccitt
20 -002.params
[... files for other PDF pages ...]
```
Note that image 2 is of type CCITT, and not JPEG as stated in the table above.
I do not know much about PDF files, but I recently learned that you can dump a pdf with command "dumppdf -a". I then searched the object IDs, and this is what I found:
```
<object id="6">
<stream>
<props>
<dict size="10">
<key>BitsPerComponent</key>
<value><number>8</number></value>
<key>ColorSpace</key>
<value><literal>DeviceGray</literal></value>
<key>Filter</key>
<value><list size="2">
<literal>FlateDecode</literal>
<literal>DCTDecode</literal>
</list></value>
<key>Height</key>
<value><number>1169</number></value>
<key>Length</key>
<value><number>11222</number></value>
<key>Mask</key>
<value><ref id="5" /></value>
<key>Name</key>
<value><literal>image_fg1</literal></value>
<key>Subtype</key>
<value><literal>Image</literal></value>
<key>Type</key>
<value><literal>XObject</literal></value>
<key>Width</key>
<value><number>827</number></value>
</dict>
</props>
</stream>
</object>
```
This object ID 6 probably corresponds to file "-001.jpg".
Note that another object is referenced like this:
`<value><ref id="5" /></value>`
That referenced object is defined as follows:
```
<object id="5">
<stream>
<props>
<dict size="10">
<key>BitsPerComponent</key>
<value><number>1</number></value>
<key>DecodeParms</key>
<value><dict size="2">
<key>Columns</key>
<value><number>2481</number></value>
<key>K</key>
<value><number>-1</number></value>
</dict></value>
<key>Filter</key>
<value><literal>CCITTFaxDecode</literal></value>
<key>Height</key>
<value><number>3508</number></value>
<key>ImageMask</key>
<value><number>True</number></value>
<key>Length</key>
<value><number>30995</number></value>
<key>Name</key>
<value><literal>image_sel1</literal></value>
<key>Subtype</key>
<value><literal>Image</literal></value>
<key>Type</key>
<value><literal>XObject</literal></value>
<key>Width</key>
<value><number>2481</number></value>
</dict>
</props>
</stream>
</object>
```
So that object is probably the one that generates files "-002.ccitt" and "-002.params".
I guess that this kind of "nesting" between object ID 6 and object ID 5 is confusing "pdfimages -list", which is generating a table with incorrect information for image number 2.
But "pdfimages -all" does not get confused, for it is extracting the expected files with the expected image types.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1210PDF with masks not rendered well on cairo backend2022-02-03T13:21:59ZAndrés MoyaPDF with masks not rendered well on cairo backendI have a file (generated by Chromium v97 printing a SVG file to PDF) that contains a masked object. This is an ellipse that masks a background rectangle: [mask.pdf](/uploads/5f6d702184bb61cbccc8d712f3c933ca/mask.pdf).
If I render it wit...I have a file (generated by Chromium v97 printing a SVG file to PDF) that contains a masked object. This is an ellipse that masks a background rectangle: [mask.pdf](/uploads/5f6d702184bb61cbccc8d712f3c933ca/mask.pdf).
If I render it with cairo backend, the mask is not working and I see the whole rectangle. But with splash backend the render is ok. Here are the commands I used to generate the images:
```
pdftocairo -png -f 1 -l 1 mask.pdf -o cairo-output
pdftoppm -png -f 1 -l 1 mask.pdf -o splash-output
```
(cairo-output)
![cairo-output-1](/uploads/908eb106b6c042abc6f014d351158923/cairo-output-1.png)
(splash-output)
![splash-output-1](/uploads/226c5e31e8a6011d32e72b6bc0dbeefa/splash-output-1.png)