poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2023-02-19T00:06:01Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1341Feature Request: JSON output option for pdfinfo2023-02-19T00:06:01ZEric RieseFeature Request: JSON output option for pdfinfoI just discovered `pdfinfo`. I wanted to rename a pdf to its title. `pdfinfo` extracted it beautifully. But parsing the output, while relatively easy, could have been far easier. I piped the output into `grep -Po 'Title:\s+\K.*'`. I didn...I just discovered `pdfinfo`. I wanted to rename a pdf to its title. `pdfinfo` extracted it beautifully. But parsing the output, while relatively easy, could have been far easier. I piped the output into `grep -Po 'Title:\s+\K.*'`. I didn't know about `\K`, I found that through a stackoverflow answer.
Anyway, this would have been easier if `pdfinfo` had an option to output in JSON format. Then I could have just piped the output into `jq .Title`. I know next to no `jq`, but I came up with this off the top of my head
```bash
echo '{"Title":"Foo"}' | jq .Title
"Foo"
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1025pdfinfo -struct-text and nested nonstructural marked content2021-01-06T00:09:43ZRenkemapdfinfo -struct-text and nested nonstructural marked contentI ran into a problem with `pdfinfo -struct-text`, which I think is caused by
a bug.
The [attached file](/uploads/6121203dd6488d57b26b30c22e7a2f11/poppler-bug.pdf) demonstrates the problem. Extracting the textual content gives this resu...I ran into a problem with `pdfinfo -struct-text`, which I think is caused by
a bug.
The [attached file](/uploads/6121203dd6488d57b26b30c22e7a2f11/poppler-bug.pdf) demonstrates the problem. Extracting the textual content gives this result:
```
$ pdfinfo -struct-text poppler-bug.pdf
Document
P (block)
"xxxyyy"
```
where the expected output would be:
```
Document
P (block)
"xxxyyyzzz"
```
The relevant part of the attached pdf is the page content stream:
```
3 0 obj
<< /Length 215 >>
stream
1 0 0 1 48.272 46.73 cm
/P <</MCID 0>> BDC 1 0 0 1 -48.272 -46.73 cm
BT
/F1 9.96264 Tf
1 0 0 1 48.272 46.73 Tm [(xxx)]TJ
/Span << >> BDC
1 0 0 1 64.046 46.73 Tm [(yyy)]TJ
EMC
1 0 0 1 79.82 46.73 Tm [(zzz)]TJ
ET
EMC
endstream
```
As you see, the problem is caused by a nonstructural marked content sequence
(the `/Span`) inside a marked content item (marked with `/P`). This is
explicitly allowed by the specification (see §14.7.4.1, p.560), yet somehow
cuts short pdfinfo’s text extraction function.
This is on debian unstable, with the following version:
```
$ pdfinfo -v
pdfinfo version 20.09.0
Copyright 2005-2020 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/266pdfinfo prints unrotated dimensions for landscape pages2018-10-07T00:35:43ZBugzilla Migration Userpdfinfo prints unrotated dimensions for landscape pages## Submitted by Ilmari Heikkinen
Assigned to **poppler-bugs**
**[Link to original bug (#17195)](https://bugs.freedesktop.org/show_bug.cgi?id=17195)**
## Description
Calling pdfinfo on a PDF with landscape orientation prints the un...## Submitted by Ilmari Heikkinen
Assigned to **poppler-bugs**
**[Link to original bug (#17195)](https://bugs.freedesktop.org/show_bug.cgi?id=17195)**
## Description
Calling pdfinfo on a PDF with landscape orientation prints the unrotated page dimensions and doesn't tell that the page is rotated.
Proposed fix:
poppler/utils/pdfinfo.cc should use doc->getPageRotate(pg) to either print whether the page is rotated or to swap w and h.
Here's a patch that does w-h swapping:
http://github.com/kig/poppler/commit/1be66974479781f84fbb6872573bb2febdc1ad60https://gitlab.freedesktop.org/poppler/poppler/-/issues/87'pdfinfo' man page misses description for 'UserProperties:' and 'Suspects:' info2018-10-11T20:27:02ZBugzilla Migration User'pdfinfo' man page misses description for 'UserProperties:' and 'Suspects:' info## Submitted by kur..@..il.com
Assigned to **poppler-bugs**
**[Link to original bug (#82468)](https://bugs.freedesktop.org/show_bug.cgi?id=82468)**
## Description
Recent versions of 'pdfinfo' print two lines in its output concerni...## Submitted by kur..@..il.com
Assigned to **poppler-bugs**
**[Link to original bug (#82468)](https://bugs.freedesktop.org/show_bug.cgi?id=82468)**
## Description
Recent versions of 'pdfinfo' print two lines in its output concerning 'UserProperties:' and 'Suspects:' which go un-mentioned in the man page.
Other than adding the mere mention of the two properties (where their values for un-tagged PDFs should always be 'no'), it would be nice to give also a one/two sentence explanation of what these properties could possibly mean.