poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2021-08-23T13:34:33Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1124Incorrect handling of signature's Unicode properties Reason and Location2021-08-23T13:34:33ZGeorgiy Sgibnevgeorgiy@sgibnev.comIncorrect handling of signature's Unicode properties Reason and LocationOkular:
![okular_bad](/uploads/fd10537e240475b4f58c306dda7a2212/okular_bad.png)
Same document in Acrobat Reader:
![acrobat](/uploads/35e404bb3e9658bfe826fdd69a4180b8/acrobat.png)
See PDF spec, 3.8.1:
> For text strings encoded in Uni...Okular:
![okular_bad](/uploads/fd10537e240475b4f58c306dda7a2212/okular_bad.png)
Same document in Acrobat Reader:
![acrobat](/uploads/35e404bb3e9658bfe826fdd69a4180b8/acrobat.png)
See PDF spec, 3.8.1:
> For text strings encoded in Unicode, the first two bytes must be 254 followed by 255,
> representing the Unicode byte order marker, U+FEFF.
> The remainder of the string consists of Unicode character codes, according to the UTF-16 encoding
> specified in the Unicode standard, version 2.0.
[sample_signed.pdf](/uploads/8bae36f78d2ca04028fb37978652afa9/sample_signed.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1123Optimize SplashClip::copy() function2021-08-21T09:43:30ZThomas FreitagOptimize SplashClip::copy() functionWhen a clipping path is copied, all xpathscanner are created new. It's much faster to copy the already existing ones.
This issue replaces issue #1121 which is unintentionally marked confidentialWhen a clipping path is copied, all xpathscanner are created new. It's much faster to copy the already existing ones.
This issue replaces issue #1121 which is unintentionally marked confidentialhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1122Poppler Glib: Problems related to poppler_page_get_text() and poppler_page_ge...2021-11-29T05:23:50ZwangzPoppler Glib: Problems related to poppler_page_get_text() and poppler_page_get_text_layout()From the description of the poppler_page_get_text_layout () method
> The position in the array represents an offset in the text returned by poppler_page_get_text()
can get the relationship between two arrays, I know this works with Engl...From the description of the poppler_page_get_text_layout () method
> The position in the array represents an offset in the text returned by poppler_page_get_text()
can get the relationship between two arrays, I know this works with English text, but is there any way to get the corresponding relationship in other languages including Chinese?
Like
> 你好, World!
(`你好` means `Hello`) would be interpreted as
`\xE4 \xBD \xA0 \xE5 \xA5 \xBD \x2C \x20 \x57 \x6F \x72 \x6C \x64`
But it's obvious that we can't get the corresponding character according to the position of the rectangle. Is there any way to solve this problem?
Thanks.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1120Random Image with black background is generated2021-08-20T11:46:01ZNikhil RankaRandom Image with black background is generated**Issue Description**
Using the `pdftohtml` generated an XML from the PDF. However, an image with black background is generated PHT3-1_8.png. When building an HTML out of this file, this leads to other elements on the page becoming invis...**Issue Description**
Using the `pdftohtml` generated an XML from the PDF. However, an image with black background is generated PHT3-1_8.png. When building an HTML out of this file, this leads to other elements on the page becoming invisible.
![image](/uploads/a47206a102367949dfc6521acb90d30e/image.png)
**Working Files**:
- PDf used [PHT3.pdf](/uploads/332bc5e3463ecc21fd2811eb5f78d467/PHT3.pdf) for conversion.
- Generated XML file: [PHT3.xml](/uploads/da2de7ec233186bae683acd44a199134/PHT3.xml)
- Extracted Images: [PHT3.zip](/uploads/ffce1db8ba86d9887c3aed3862cede52/PHT3.zip)
Would be great if you could share your advice on how to resolve this.
Attaching more information just in case it helps.
**System Information**:
```
Windows 10 Home
Version: 2004
```
**Poppler Details**:
```
pdftohtml version 0.68.0
Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC
```
Thanks!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1119Images having a white/transparent background have a black background in the e...2021-08-24T11:35:13ZNikhil RankaImages having a white/transparent background have a black background in the extracted images**Issue Description**
Using the `pdftohtml` generated an XML from the PDF. However, images with a white/transparent background in the PDF have a black background in the extracted images. Image list in the zip attached PHT3-1_1.png, PHT3-...**Issue Description**
Using the `pdftohtml` generated an XML from the PDF. However, images with a white/transparent background in the PDF have a black background in the extracted images. Image list in the zip attached PHT3-1_1.png, PHT3-1_11.png, PHT3-1_12.png, PHT3-1_13.png. Sample image:
![image](/uploads/e4e1b93e3b357b890f31c7b65f6440ca/image.png)
**Working Files**:
- PDf used [PHT3.pdf](/uploads/332bc5e3463ecc21fd2811eb5f78d467/PHT3.pdf) for conversion.
- Generated XML file: [PHT3.xml](/uploads/da2de7ec233186bae683acd44a199134/PHT3.xml)
- Extracted Images: [PHT3.zip](/uploads/ffce1db8ba86d9887c3aed3862cede52/PHT3.zip)
Would be great if you could share your advice on how this can be resolved
Attaching more information just in case it helps.
**System Information**:
```
Windows 10 Home
Version: 2004
```
**Poppler Details**:
```
pdftohtml version 0.68.0
Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC
```
Thanks!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1118Extracting all images present in the PDF2021-09-01T21:27:47ZNikhil RankaExtracting all images present in the PDF**Issue Description**
Using the `pdftohtml` generated an XML from the PDF. But the boxes containing the text 'Test' are not extracted as separate images
![image](/uploads/9ce08a510d849e3e02d262971091de4a/image.png)
**Working Files**:
- ...**Issue Description**
Using the `pdftohtml` generated an XML from the PDF. But the boxes containing the text 'Test' are not extracted as separate images
![image](/uploads/9ce08a510d849e3e02d262971091de4a/image.png)
**Working Files**:
- PDf used [PHT3.pdf](/uploads/332bc5e3463ecc21fd2811eb5f78d467/PHT3.pdf) for conversion.
- Generated XML file: [PHT3.xml](/uploads/da2de7ec233186bae683acd44a199134/PHT3.xml)
- Extracted Images: [PHT3.zip](/uploads/ffce1db8ba86d9887c3aed3862cede52/PHT3.zip)
Would be great if you could share your advice on how to resolve this.
Attaching more information just in case it helps.
**System Information**:
```
Windows 10 Home
Version: 2004
```
**Poppler Details**:
```
pdftohtml version 0.68.0
Copyright 2005-2018 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC
```
Thanks!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1117pdftohtml: ll converted to l using some fonts2021-11-01T10:17:54ZJulian Altunapdftohtml: ll converted to l using some fontsConverting a pdf with ArialMT font causes the conversion to output a single 'l' in words with double 'l', like 'will' for example.
pdftohtml version 0.62.0
Command used 'pdftohtml -stdout willtest.pdf'
[willtest.pdf](/uploads/7951cada...Converting a pdf with ArialMT font causes the conversion to output a single 'l' in words with double 'l', like 'will' for example.
pdftohtml version 0.62.0
Command used 'pdftohtml -stdout willtest.pdf'
[willtest.pdf](/uploads/7951cadabf189ab24415a0eba9590d6e/willtest.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1116Fetching layout details of a PDF2021-08-19T11:35:05ZNikhil RankaFetching layout details of a PDFHello,
Currently using the library to parse a PDF and convert it into an email-compatible HTML. But the generated HTML is not compatible with email. Is there a way the x,y position, and height, width of all the objects in the PDF be ret...Hello,
Currently using the library to parse a PDF and convert it into an email-compatible HTML. But the generated HTML is not compatible with email. Is there a way the x,y position, and height, width of all the objects in the PDF be retrieved?
Sharing a working code-snippet would be a great help.
Thanks!https://gitlab.freedesktop.org/poppler/poppler/-/issues/1115Poppler GLib: is it possible to have another `poppler_page_get_selected_regio...2021-08-19T09:58:21ZwangzPoppler GLib: is it possible to have another `poppler_page_get_selected_region` function that does not contain a merge operation?I saw that `poppler_page_get_selection_region` is already deprecated, but sometimes it is useful to return a list of unmerged rectangles.
In my experience, `poppler_page_get_selection_region` return a list of rectangles each contains on...I saw that `poppler_page_get_selection_region` is already deprecated, but sometimes it is useful to return a list of unmerged rectangles.
In my experience, `poppler_page_get_selection_region` return a list of rectangles each contains one line, this is to help some pdf viewer draw underline, strike-through markers.
But in merged `cairo_region_t` region, it is not possible to tell where is the bottom of one line.
So I really hope you could consider to provide another method to do this.
Thanks in advance.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1113Misrendering: images to which Microsoft Word gives 'soft edges'2023-03-01T21:25:44ZLinuxOnTheDesktopMisrendering: images to which Microsoft Word gives 'soft edges'Poppler - or at least [pdfpc](https://github.com/pdfpc/pdfpc), which is a program that uses poppler - renders the edges in question not 'softly' but as hard lines. Please see the attached pdf, which should not have a visible square borde...Poppler - or at least [pdfpc](https://github.com/pdfpc/pdfpc), which is a program that uses poppler - renders the edges in question not 'softly' but as hard lines. Please see the attached pdf, which should not have a visible square border and yet, in pdfpc, does.
Here is the relevant bit of Microsoft Word's UI:
![image](/uploads/6b78e555fb26114fef59de97f7063c0a/image.png)
I note that Sumatra PDF and Okular properly display PDFs that contained images of the type in question.
I note finally that originally I submitted [this bug report against pdfpc](https://github.com/pdfpc/pdfpc/issues/601).https://gitlab.freedesktop.org/poppler/poppler/-/issues/1112PDFtoCairo only converts first page on tiff conversion2021-08-16T21:52:20Zfrederick0291PDFtoCairo only converts first page on tiff conversionhttps://github.com/Belval/pdf2image/issues/206
Converting pdf to tiff with:
pdftocairo -tiff sample.pdf
only converts the first page.https://github.com/Belval/pdf2image/issues/206
Converting pdf to tiff with:
pdftocairo -tiff sample.pdf
only converts the first page.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1111pdftops: produces incorrect colors in output PostScript file2021-08-30T15:34:13ZMilos Wimmerpdftops: produces incorrect colors in output PostScript filepdftops produces bad colors in output PostScript file. I use this command:
pdftops -q -level2sep -r 300 1.pdf
Thanks for your work, Milos
[1.pdf](/uploads/5f613e091e8fab1ebc458a6623948d04/1.pdf)
[2.pdf](/uploads/dc1dc5eeb60edae226e2b...pdftops produces bad colors in output PostScript file. I use this command:
pdftops -q -level2sep -r 300 1.pdf
Thanks for your work, Milos
[1.pdf](/uploads/5f613e091e8fab1ebc458a6623948d04/1.pdf)
[2.pdf](/uploads/dc1dc5eeb60edae226e2b911c60a85b3/2.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1110pdftops: produces incorrect output PostScript file2022-12-04T00:29:19ZMilos Wimmerpdftops: produces incorrect output PostScript filepdftops doesn't convert attached files correctly. I use this command:
pdftops -q -level2 -r 300 1.pdf
Thanks for your work, Milos
[1.pdf](/uploads/7e15a576bf942d5d0b4bfaaef9f8877b/1.pdf)
[2.pdf](/uploads/b3bb3a884c5cd69c44603781cb27...pdftops doesn't convert attached files correctly. I use this command:
pdftops -q -level2 -r 300 1.pdf
Thanks for your work, Milos
[1.pdf](/uploads/7e15a576bf942d5d0b4bfaaef9f8877b/1.pdf)
[2.pdf](/uploads/b3bb3a884c5cd69c44603781cb27d1f9/2.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1109pdftops: incorrect output2021-08-10T20:40:54ZMilos Wimmerpdftops: incorrect outputpdftops is great, but attached files with AcroForms doesn't convert correctly.
pdftops -q -level2 -r 300 1.pdf
=> creates bad output PostScript file
pdftops -q -level2sep -r 300 1.pdf
=> creates right PostScript file, but colors are ...pdftops is great, but attached files with AcroForms doesn't convert correctly.
pdftops -q -level2 -r 300 1.pdf
=> creates bad output PostScript file
pdftops -q -level2sep -r 300 1.pdf
=> creates right PostScript file, but colors are not correct
ghostscript -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=1.ps 1.pdf
=> creates right PostScript file with correct colors
Thanks for your work, Milos
[1.pdf](/uploads/70505939cf0e82ced394b19f2abdd71a/1.pdf)
[2.pdf](/uploads/64131a58105f80d9c1caf5168b0dfccd/2.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1108Error during CMakeFiles with pdf2htmlEX2021-08-05T12:45:11ZDavid FaizulaevError during CMakeFiles with pdf2htmlEXHello,
I’m working on upgrading a Docker image which uses Cairo, poppler and pdf2htmlEX and encounter cairo compilation related errors.
Here my dockerfile
```
`RUN echo $ECR_IMAGE_TAG_NONPROD
RUN echo $ARTIFACTORY_USER_NAME
RUN echo...Hello,
I’m working on upgrading a Docker image which uses Cairo, poppler and pdf2htmlEX and encounter cairo compilation related errors.
Here my dockerfile
```
`RUN echo $ECR_IMAGE_TAG_NONPROD
RUN echo $ARTIFACTORY_USER_NAME
RUN echo $ARTIFACTORY_PASSWORD
RUN echo $BUILD_CHEGG_ENV
RUN echo $SVC_AWS_ACCESS_KEY_ID
RUN echo $SVC_AWS_SECRET_ACCESS_KEY
RUN echo $SVC_AWS_DEFAULT_REGION
RUN echo $CI_PROJECT_DIR
RUN dpkg --configure -a
RUN apt-get clean
RUN apt-get update
RUN apt-get install -f -y python3
RUN apt-get install dialog apt-utils -y
RUN apt-get install -f -y python3-pip
RUN apt-get install -f -y python3-setuptools
RUN apt-get install -f -y wget
RUN apt-get install -f -y poppler-utils
RUN apt-get install -f -y jq
RUN apt-get install -f -y pdftk
RUN apt-get install -f -y ffmpeg
RUN apt-get install -f -y build-essential
RUN apt-get install -f -y cmake
RUN apt-get install -f -y libfontforge-dev
RUN pip3 install --upgrade pip \
&& apt-get clean
RUN pip3 --no-cache-dir install --upgrade awscli
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
#RUN apt-get install -f -y libpaper1
#RUN apt-get --purge remove -f -y libpaper1:amd64 libpaper-utils unattended-upgrades libgs9:amd64 ghostscript
#RUN apt-get clean -y
#RUN apt-get update -y && apt-get upgrade -y
#RUN apt autoremove -y
#RUN apt-get install -f -y ghostscript --option=Dpkg::Options::=--force-all
WORKDIR /tmp
COPY lib/cairo-1.17.4.tar.xz /tmp
RUN wget https://cairographics.org/releases/pixman-0.36.0.tar.gz
RUN tar xvfz pixman-0.36.0.tar.gz
RUN ./pixman-0.36.0/configure && make && make install
RUN tar -xf cairo-1.17.4.tar.xz
RUN ./cairo-1.17.4/configure --prefix=/tmp/cairob && make && make install
RUN cp -r /tmp/cairob/lib/* /usr/lib/x86_64-linux-gnu/`
```
**and here is the error I get during docker build:**
```
#51 2.179 [ 2%] Building CXX object CMakeFiles/pdf2htmlEX.dir/3rdparty/poppler/git/CairoFontEngine.cc.o
#51 2.900 In file included from /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc:41:0:
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:148:8: error: 'void CairoOutputDev::setDefaultCTM(double*)' marked 'override', but does not override
#51 2.900 void setDefaultCTM(double *ctm) override;
#51 2.900 ^~~~~~~~~~~~~
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:172:9: error: 'GBool CairoOutputDev::tilingPatternFill(GfxState*, Gfx*, Catalog*, Object*, double*, int, int, Dict*, double*, double*, int, int, int, int, double, double)' marked 'override', but does not override
#51 2.900 GBool tilingPatternFill(GfxState *state, Gfx *gfx, Catalog *cat, Object *str,
#51 2.900 ^~~~~~~~~~~~~~~~~
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:194:8: error: 'void CairoOutputDev::beginString(GfxState*, GooString*)' marked 'override', but does not override
#51 2.900 void beginString(GfxState *state, GooString *s) override;
#51 2.900 ^~~~~~~~~~~
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:200:8: error: 'void CairoOutputDev::beginActualText(GfxState*, GooString*)' marked 'override', but does not override
#51 2.900 void beginActualText(GfxState *state, GooString *text) override;
#51 2.900 ^~~~~~~~~~~~~~~
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:247:8: error: 'void CairoOutputDev::beginTransparencyGroup(GfxState*, double*, GfxColorSpace*, GBool, GBool, GBool)' marked 'override', but does not override
#51 2.900 void beginTransparencyGroup(GfxState * /*state*/, double * /*bbox*/,
#51 2.900 ^~~~~~~~~~~~~~~~~~~~~~
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:253:8: error: 'void CairoOutputDev::paintTransparencyGroup(GfxState*, double*)' marked 'override', but does not override
#51 2.900 void paintTransparencyGroup(GfxState * /*state*/, double * /*bbox*/) override;
#51 2.900 ^~~~~~~~~~~~~~~~~~~~~~
#51 2.900 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:254:8: error: 'void CairoOutputDev::setSoftMask(GfxState*, double*, GBool, Function*, GfxColor*)' marked 'override', but does not override
#51 2.900 void setSoftMask(GfxState * /*state*/, double * /*bbox*/, GBool /*alpha*/,
#51 2.900 ^~~~~~~~~~~
#51 2.902 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:433:8: error: 'void CairoImageOutputDev::setDefaultCTM(double*)' marked 'override', but does not override
#51 2.902 void setDefaultCTM(double *ctm) override { }
#51 2.902 ^~~~~~~~~~~~~
#51 2.902 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:456:9: error: 'GBool CairoImageOutputDev::tilingPatternFill(GfxState*, Gfx*, Catalog*, Object*, double*, int, int, Dict*, double*, double*, int, int, int, int, double, double)' marked 'override', but does not override
#51 2.902 GBool tilingPatternFill(GfxState *state, Gfx *gfx, Catalog *cat, Object *str,
#51 2.902 ^~~~~~~~~~~~~~~~~
#51 2.902 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:501:8: error: 'void CairoImageOutputDev::beginTransparencyGroup(GfxState*, double*, GfxColorSpace*, GBool, GBool, GBool)' marked 'override', but does not override
#51 2.902 void beginTransparencyGroup(GfxState * /*state*/, double * /*bbox*/,
#51 2.902 ^~~~~~~~~~~~~~~~~~~~~~
#51 2.902 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:506:8: error: 'void CairoImageOutputDev::paintTransparencyGroup(GfxState*, double*)' marked 'override', but does not override
#51 2.902 void paintTransparencyGroup(GfxState * /*state*/, double * /*bbox*/) override {}
#51 2.902 ^~~~~~~~~~~~~~~~~~~~~~
#51 2.902 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoOutputDev.h:507:8: error: 'void CairoImageOutputDev::setSoftMask(GfxState*, double*, GBool, Function*, GfxColor*)' marked 'override', but does not override
#51 2.902 void setSoftMask(GfxState * /*state*/, double * /*bbox*/, GBool /*alpha*/,
#51 2.902 ^~~~~~~~~~~
#51 2.961 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc: In member function 'double CairoFont::getSubstitutionCorrection(GfxFont*)':
#51 2.961 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc:123:56: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
#51 2.961 if ((name = ((Gfx8BitFont *)gfxFont)->getCharName(code)) &&
#51 2.961 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
#51 2.962 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc: In static member function 'static CairoFreeTypeFont* CairoFreeTypeFont::create(GfxFont*, XRef*, FT_Library, GBool)':
#51 2.962 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc:448:37: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
#51 2.962 fileNameC = fileName->getCString();
#51 2.962 ~~~~~~~~~~~~~~~~~~~~^~
#51 2.964 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc: In function 'cairo_status_t _init_type3_glyph(cairo_scaled_font_t*, cairo_t*, cairo_font_extents_t*)':
#51 2.964 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc:645:26: error: invalid conversion from 'const double*' to 'double*' [-fpermissive]
#51 2.964 mat = font->getFontBBox();
#51 2.964 ~~~~~~~~~~~~~~~~~^~
#51 2.964 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc: In function 'cairo_status_t _render_type3_glyph(cairo_scaled_font_t*, long unsigned int, cairo_t*, cairo_text_extents_t*)':
#51 2.964 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc:686:28: error: invalid conversion from 'const double*' to 'double*' [-fpermissive]
#51 2.964 mat = font->getFontMatrix();
#51 2.964 ~~~~~~~~~~~~~~~~~~~^~
#51 2.965 /tmp/pdf2htmlEX-0.15.0/3rdparty/poppler/git/CairoFontEngine.cc:701:26: error: invalid conversion from 'const double*' to 'double*' [-fpermissive]
#51 2.965 mat = font->getFontBBox();
#51 2.965 ~~~~~~~~~~~~~~~~~^~
#51 3.045 make[2]: *** [CMakeFiles/pdf2htmlEX.dir/3rdparty/poppler/git/CairoFontEngine.cc.o] Error 1
#51 3.045 CMakeFiles/pdf2htmlEX.dir/build.make:62: recipe for target 'CMakeFiles/pdf2htmlEX.dir/3rdparty/poppler/git/CairoFontEngine.cc.o' failed
#51 3.045 CMakeFiles/Makefile2:355: recipe for target 'CMakeFiles/pdf2htmlEX.dir/all' failed
#51 3.045 make[1]: *** [CMakeFiles/pdf2htmlEX.dir/all] Error 2
#51 3.046 make: *** [all] Error 2
#51 3.046 Makefile:138: recipe for target 'all' failed
```
Please advise how I can resolve thishttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1106pdfimages: options to add a container to JBIG2 and CCITT data2023-11-24T13:42:01ZShai4shepdfimages: options to add a container to JBIG2 and CCITT dataThe current extraction of (embedded) JBIG2 (stream) does not include any header (like `0xFF 0xD8` for JPEG), at least when there is no global data. The jbig2 output produced by [jbig2enc](https://github.com/agl/jbig2enc) will add a heade...The current extraction of (embedded) JBIG2 (stream) does not include any header (like `0xFF 0xD8` for JPEG), at least when there is no global data. The jbig2 output produced by [jbig2enc](https://github.com/agl/jbig2enc) will add a header to indicate that this is a JBIG2 file. Although this might not be standardized, it is helpful to add such a header so that it could be passed to subsequent applications like `img2pdf`.
Similar for CCITT: it seems better to have an option to contain the CCITT into a TIFF file without any conversion, but just including a container to facilitate the subsequent processing. Unlike `-tiff` option, it will not convert everything else to TIFF, nor perform any conversion between different types of TIFFs.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1105Super/subscripts break correspondence between poppler_page_get_text and poppl...2021-08-22T19:35:23ZG TurretSuper/subscripts break correspondence between poppler_page_get_text and poppler_page_get_text_layout (glib)According to the glib [documentation](https://poppler.freedesktop.org/api/glib/poppler-Poppler-Page.html#poppler-page-get-text-layout) the position in the array returned by _poppler_page_get_text_layout()_ represents an offset in the tex...According to the glib [documentation](https://poppler.freedesktop.org/api/glib/poppler-Poppler-Page.html#poppler-page-get-text-layout) the position in the array returned by _poppler_page_get_text_layout()_ represents an offset in the text returned by _poppler_page_get_text()_. However this isn't the case if the pdf contains super- or subscripts.
I can't program in C so the below python code is the closest example I can give (I've tried another glib wrapper though with the same results). The code is run on this example pdf [popplertest.pdf](/uploads/6e96e7657c84134d97e468e41edb30ac/popplertest.pdf).
As can be seen, _poppler_page_get_text_layout()_ returns very small regions for the super- and subscripts, which are not reflected by anything returned by _poppler_page_get_text()_, causing the subsequent text to have the wrong offset compared to _poppler_page_get_text_layout()_. In general it is not a solution to just remove all such small regions, since in some pdf's they might, for example, be larger than punctuation in a footnote.
```python
# PyGObject
## pip install PyGObject # version 3.40.1
## poppler 21.07.0_1 / gir-1.0 / Poppler-0.18.gir
import gi
gi.require_version("Poppler", "0.18")
from gi.repository import Poppler
fileuri = "file:///Users/x/Downloads/popplertest.pdf"
document = Poppler.Document.new_from_file(fileuri)
page = document.get_page(0)
ptext = Poppler.Page.get_text(page)
ptlayout = Poppler.Page.get_text_layout(page)[1]
len(ptext) # 26
len(ptlayout) # 28
help(ptlayout[0])
print(" x1 x2 y1 y2 x2-x1 y2-y1")
for i in range(len(ptext)):
print(f"{repr(ptext[i]):4s} {ptlayout[i].x1:8.1f} {ptlayout[i].x2:8.1f} {ptlayout[i].y1:8.1f} {ptlayout[i].y2:8.1f} {ptlayout[i].x2 - ptlayout[i].x1:8.4f} {ptlayout[i].y2 - ptlayout[i].y1:8.2f}")
print(" x1 x2 y1 y2 x2-x1 y2-y1")
for i in range(len(ptext), len(ptlayout)):
print(f" {ptlayout[i].x1:8.1f} {ptlayout[i].x2:8.1f} {ptlayout[i].y1:8.1f} {ptlayout[i].y2:8.1f} {ptlayout[i].x2 - ptlayout[i].x1:8.4f} {ptlayout[i].y2 - ptlayout[i].y1:8.2f}")
```
```markdown
>>>
x1 x2 y1 y2 x2-x1 y2-y1
'T' 290.0 300.4 172.5 187.7 10.3413 15.20
'e' 300.4 307.6 172.5 187.7 7.1874 15.20
's' 307.6 313.9 172.5 187.7 6.3783 15.20
't' 313.9 320.2 172.5 187.7 6.2888 15.20
'\n' 320.2 320.2 187.7 187.7 0.0000 0.00
'A' 142.7 150.9 248.8 258.5 8.1306 9.63
'B' 150.9 158.5 248.8 258.5 7.6800 9.63
'C' 158.5 166.4 248.8 258.5 7.8327 9.63
'0' 166.4 166.4 248.8 258.5 -0.0003 9.63
'1' 166.4 170.6 252.5 259.5 4.2345 7.08
' ' 170.6 174.8 252.5 259.5 4.2345 7.08
'D' 174.8 179.0 252.5 259.5 4.1150 7.08
'E' 179.0 187.2 248.8 258.5 8.2822 9.63
'F' 187.2 194.6 248.8 258.5 7.3789 9.63
'\n' 194.6 201.7 248.8 258.5 7.0778 9.63
'G' 201.7 201.7 258.5 258.5 0.0000 0.00
'H' 125.8 134.3 262.4 272.0 8.5091 9.63
'I' 134.3 142.4 262.4 272.0 8.1306 9.63
'2' 142.4 146.4 262.4 272.0 3.9153 9.63
'3' 146.4 146.4 262.4 272.0 0.0001 9.63
' ' 146.4 150.6 260.4 267.5 4.2345 7.08
'J' 150.6 154.8 260.4 267.5 4.2345 7.08
'K' 154.8 158.9 260.4 267.5 4.1150 7.08
'L' 158.9 164.5 262.4 272.0 5.5724 9.63
'\n' 164.5 172.9 262.4 272.0 8.4316 9.63
'1' 172.9 179.7 262.4 272.0 6.7767 9.63
x1 x2 y1 y2 x2-x1 y2-y1
179.7 179.7 272.0 272.0 0.0000 0.00
302.4 307.8 691.5 701.2 5.4240 9.63
>>>
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1104pdftohtml ignores cropping in pdf during conversion2021-07-28T18:22:24ZEmanuele Guzzettipdftohtml ignores cropping in pdf during conversionHi. I'm here because calibre uses your pdftohtml and it doesn't honor cropping in pdf files.
For example I want to use k2pdfopt (https://www.willus.com/k2pdfopt/) to split the pdf in two columns to have them separated in two pages.
I'l...Hi. I'm here because calibre uses your pdftohtml and it doesn't honor cropping in pdf files.
For example I want to use k2pdfopt (https://www.willus.com/k2pdfopt/) to split the pdf in two columns to have them separated in two pages.
I'll attach an example. [orig.pdf](/uploads/e235392ab3f3d7caaa4691c4b70cfb0e/orig.pdf) is the original file.
I used this command to split the page in two:
`k2pdfopt -grid 2x1x0 -mode crop -p -n orig.pdf -o out.pdf`
here's the result:
[out.pdf](/uploads/fd99377f083a183d731900502088dd9f/out.pdf)
adobe readers display it as two pages.calibre instead shows it as the original file (one page).
Thanks in advancehttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1103Homebrew Installation -- SHA Mismatch2021-07-22T15:31:27ZKeith HelsabeckHomebrew Installation -- SHA MismatchWhen doing a Brew Install on Mac, there is a SHA-256 mismatch with a file from one of the default sources (https://www.ijg.org/files/jpegsrc.v9d.tar.gz). I assume it is likely something not updated & re-hashed, but I am not sure.When doing a Brew Install on Mac, there is a SHA-256 mismatch with a file from one of the default sources (https://www.ijg.org/files/jpegsrc.v9d.tar.gz). I assume it is likely something not updated & re-hashed, but I am not sure.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1101Allow toggling off all radio buttons2021-07-19T22:21:08ZLuca WeissAllow toggling off all radio buttonsHi, in this block of code https://gitlab.freedesktop.org/poppler/poppler/-/blob/3d49757055dbcd2876c0b26ee00a7bd780541938/poppler/Form.cc#L1353-1354 you disallow turning off all radio buttons which in general might make sense but I have a...Hi, in this block of code https://gitlab.freedesktop.org/poppler/poppler/-/blob/3d49757055dbcd2876c0b26ee00a7bd780541938/poppler/Form.cc#L1353-1354 you disallow turning off all radio buttons which in general might make sense but I have at least two cases in which this would need to be skipped:
* A user is filling out a form with two radio buttons, selects one of them, presses undo in the PDF reader (e.g. Okular) but poppler doesn't accept the undo leaving one of the checkboxes checked.
* Similar scenario as above but the user wants to consciously not check any radio buttons after one of them has already been checked in a form. This is not possible due to the poppler check.
I do have a WIP patch for Okular which allows deselecting a radio button group but it currently fails at doing so. Removing the `return` in the block linked above does allow this to happen.
Can we add a new public parameter to setState (`allToggleOff`?) that allows applications to achieve this?
I'm testing with the following pdf: https://www.help.gv.at/Portal.Node/hlpd/public/resources/documents/meldez.pdf , there are many radio buttons to try on.
Kind regards