poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2023-03-10T22:04:26Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1373Add a function to get form type from cpp document class2023-03-10T22:04:26ZAsger Hautop DrewsenAdd a function to get form type from cpp document classWe have a use case where we need to figure out if a document contains an XFA form.
Currently we are running `pdfinfo` and parsing the output.
Instead it would be nice if the cpp `document` class had the following function:
```c++
FormTy...We have a use case where we need to figure out if a document contains an XFA form.
Currently we are running `pdfinfo` and parsing the output.
Instead it would be nice if the cpp `document` class had the following function:
```c++
FormType form_type() const;
```
where `FormType` is the same as `Catalog::FormType`:
```c++
enum FormType
{
NoForm,
AcroForm,
XfaForm
};
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1076Whitespace changes2021-05-11T22:00:59ZJeroen OomsWhitespace changesI posted about this on the mailing list [earlier this week](https://lists.freedesktop.org/archives/poppler/2021-May/014745.html), somebody replied that I better open an issue here.
I maintain R bindings called pdftools, mostly used for ...I posted about this on the mailing list [earlier this week](https://lists.freedesktop.org/archives/poppler/2021-May/014745.html), somebody replied that I better open an issue here.
I maintain R bindings called pdftools, mostly used for extracting text from scientific documents. The bindings wrap the C++ API, in particular we convert pdf to text using `poppler::page::text()` with physical_layout.
Recently users have started to report changes in behaviour with newer versions of poppler, in particular wrt whitespace. For example, all pages are now terminated end with an `\f` symbol which was not the case before. On Windows, linebreaks are now converted as `\r\n` instead of just '\n' as before (we use mingw-w64 compilers). And also, some documents that would contain a single linebreak in e.g. poppler 0.73, now have 4 or 5 linebreaks on the same place with the latest poppler.
## The \f form-feed character
As of poppler 0.88 all pages end with an `\f` character. This messes up some data pipelines. Below is an example program that I used to bisect (I use [NEWS.pdf](https://cran.r-project.org/doc/manuals/r-release/NEWS.pdf) but it can be any PDF really)
```cpp
#include <poppler-document.h>
#include <poppler-page.h>
#include <iostream>
using namespace std;
using poppler::document;
using poppler::byte_array;
int main(){
document *doc = document::load_from_file(std::string("NEWS.pdf"));
poppler::page *p = doc->create_page(0);
byte_array str = p->text().to_utf8();
if(str.back() == '\f'){
cout << "FOUND EVIL F CHARACTER!\n";
return 1;
}
cout << "DID NOT FIND EVIL F CHARACTER!\n";
return 0;
}
```
The result of the bisect is this commit by @jiri in https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/526
```
1e098e9b272d57478a3f23a9a6b6bb1542740aaf is the first new commit
commit 1e098e9b272d57478a3f23a9a6b6bb1542740aaf
Author: Jiri Jakes <freedesktop@jirijakes.eu>
Date: Tue Mar 31 21:38:23 2020 +0000
cpp: Add non_raw_non_physical layout for page::text()
```
## Extra linebreaks
The same commit also seems to affect linebreaks. For example if we use the program above to print [NEWS.pdf](https://cran.r-project.org/doc/manuals/r-release/NEWS.pdf))
```cpp
#include <poppler-document.h>
#include <poppler-page.h>
#include <iostream>
using namespace std;
using poppler::document;
using poppler::byte_array;
int main(){
document *doc = document::load_from_file(std::string("NEWS.pdf"));
poppler::page *p = doc->create_page(0);
byte_array buf = p->text().to_utf8();
std::string str(buf.begin(), buf.end());
cout << str;
return 0;
}
```
In poppler 0.87 we see the fist lines:
```
NEWS for R version 4.0.5 (2021-03-31)
NEWS R News
CHANGES IN R 4.0.5
BUG FIXES:
• The change to the internal table in R 4.0.4 for iswprint has been reverted: it con-
tained some errors in printability of ‘East Asian’ characters.
• For packages using ‘LazyData’, R CMD build ignored the ‘--resave-data’ option and
the ‘BuildResaveData’ field of the ‘DESCRIPTION’ file (in R versions 4.0.0 to 4.0.4).
CHANGES IN R 4.0.4
```
However in poppler 0.88 we see:
```
NEWS for R version 4.0.5 (2021-03-31)
NEWS R News
CHANGES IN R 4.0.5
BUG FIXES:
• The change to the internal table in R 4.0.4 for iswprint has been reverted: it con-
tained some errors in printability of ‘East Asian’ characters.
• For packages using ‘LazyData’, R CMD build ignored the ‘--resave-data’ option and
the ‘BuildResaveData’ field of the ‘DESCRIPTION’ file (in R versions 4.0.0 to 4.0.4).
CHANGES IN R 4.0.4
```
On Windows the linebreaks change from `\n` to `\r\n`. I assume this is not intentional?https://gitlab.freedesktop.org/poppler/poppler/-/issues/711C++ frontend - how to save a page2019-01-07T18:46:20ZlucidprogrammerC++ frontend - how to save a pageI am trying to use the C++ front end without directly using the c headers. I don't find a way to save a page. If you can expose some sort of `savePageAs` in the C++ front end, that will be great.I am trying to use the C++ front end without directly using the c headers. I don't find a way to save a page. If you can expose some sort of `savePageAs` in the C++ front end, that will be great.https://gitlab.freedesktop.org/poppler/poppler/-/issues/553"UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)2018-10-26T11:11:40ZBugzilla Migration User"UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)## Submitted by Franz Brauße
Assigned to **poppler-bugs**
**[Link to original bug (#96313)](https://bugs.freedesktop.org/show_bug.cgi?id=96313)**
## Description
Hi.
ustring::to_utf8() creates a
MiniIconv ic("UTF-8", "UTF-16");...## Submitted by Franz Brauße
Assigned to **poppler-bugs**
**[Link to original bug (#96313)](https://bugs.freedesktop.org/show_bug.cgi?id=96313)**
## Description
Hi.
ustring::to_utf8() creates a
MiniIconv ic("UTF-8", "UTF-16");
assuming that iconv(3) uses the native byte order for "UTF-16". On OS X w/ Intel CPUs (I installed poppler through MacPorts, but this issue is unrelated, see below) this fails, as a quick
$ echo -n 7 | iconv -t utf-16 | hexdump -C
00000000 fe ff 00 37 |...7|
reveals: it's UTF-16BE.
This breaks page-labels for me, which instead of "78" (UTF-8) return the (hex) values
e3 9c 80 e3 a0 80
which is 0x3700 0x3800.
A fix might be to not "decode" GooString's UTF-16BE to native byte order in
detail::unicode_GooString_to_ustring(GooString *str)
or use a source encoding based on the BYTE_ORDER macro instead of just "UTF-16BE" or to check the BOM-character output by iconv(3) (which e.g.
ustring::from_utf8(const char *str, int len)
currently skips).https://gitlab.freedesktop.org/poppler/poppler/-/issues/486Expose GlobalParams in CPP api2018-10-26T11:08:36ZBugzilla Migration UserExpose GlobalParams in CPP api## Submitted by Jeroen Ooms
Assigned to **poppler-bugs**
**[Link to original bug (#103570)](https://bugs.freedesktop.org/show_bug.cgi?id=103570)**
## Description
Currently it is not possible in the cpp api to set the poppler-data ...## Submitted by Jeroen Ooms
Assigned to **poppler-bugs**
**[Link to original bug (#103570)](https://bugs.freedesktop.org/show_bug.cgi?id=103570)**
## Description
Currently it is not possible in the cpp api to set the poppler-data path at runtime. This would be very useful.https://gitlab.freedesktop.org/poppler/poppler/-/issues/64Implement digital signature support (cpp frontend)2018-10-26T11:10:55ZBugzilla Migration UserImplement digital signature support (cpp frontend)## Submitted by Albert Astals Cid
Assigned to **poppler-bugs**
**[Link to original bug (#94377)](https://bugs.freedesktop.org/show_bug.cgi?id=94377)**
## Description
Expose the core code from https://bugs.freedesktop.org/show_bug....## Submitted by Albert Astals Cid
Assigned to **poppler-bugs**
**[Link to original bug (#94377)](https://bugs.freedesktop.org/show_bug.cgi?id=94377)**
## Description
Expose the core code from https://bugs.freedesktop.org/show_bug.cgi?id=16770 for the cpp frontend