poppler issueshttps://gitlab.freedesktop.org/poppler/poppler/-/issues2024-01-08T18:59:48Zhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/1458pdftotext: support tsv output in reading order2024-01-08T18:59:48ZFawaz Ahmedpdftotext: support tsv output in reading orderHello,
I see [tsv flag](https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/831) was added to emulate tesseract format.
Tesseract prints tsv in reading order, but the tsv output by pdftotext is not in reading order.
It wil...Hello,
I see [tsv flag](https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/831) was added to emulate tesseract format.
Tesseract prints tsv in reading order, but the tsv output by pdftotext is not in reading order.
It will be helpful if tsv follows `-layout` reading order, when `-tsv` is true.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1426pdftotext -bbox-layout values are out of bounds2023-08-30T09:21:14ZFawaz Ahmedpdftotext -bbox-layout values are out of boundsThe bbox values are sometimes negative and sometimes more than the pagewidth/pageheight.
bbox values are generated using `pdftotext -bbox-layout` option.
[PDF 1](/uploads/e0c46bbb69592c1908618de96a7e12f5/1162c701-2aa3-4403-aed3-a08ce3e...The bbox values are sometimes negative and sometimes more than the pagewidth/pageheight.
bbox values are generated using `pdftotext -bbox-layout` option.
[PDF 1](/uploads/e0c46bbb69592c1908618de96a7e12f5/1162c701-2aa3-4403-aed3-a08ce3efefed.pdf) - Refer page 11, yMin is negative i.e `yMin="-10.000000"`
[PDF 2](/uploads/ebc079de54b652709a69e52cd0827b8d/cc2c817c-91d2-4600-9274-0b6ccc97921d.pdf) - Refer page 10, xMin is more than pageWidth i.e `pageWidth="612.000000"` and `xMin="647.143036"`
```
$ pdftotext -v
pdftotext version 23.06.0
Copyright 2005-2023 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1425pdftotext sometimes moves superscript to the previous line2023-08-29T22:37:13ZVincent Lefevrepdftotext sometimes moves superscript to the previous lineThe `pdftotext` utility sometimes moves superscript to the previous line. Consider the following [text.pdf](/uploads/80db40445007b87d3d706e470dbbdd9c/text.pdf) file. On one line, one has "[...] texte... 4 texte [...]". On the next line, ...The `pdftotext` utility sometimes moves superscript to the previous line. Consider the following [text.pdf](/uploads/80db40445007b87d3d706e470dbbdd9c/text.pdf) file. On one line, one has "[...] texte... 4 texte [...]". On the next line, one has "[...] que a² < A,", with the superscript "2" below the "4" (and a bit on the left).
`pdftotext` generates:
```
√texte texte... 24 texte texte texte texte (texte texte texte
t). Texte texte texte texte texte texte A telle que a < A,
```
i.e. it moves the superscript "2" just before the "4".
Note that since the superscript "2" is strictly below the "4" in the PDF rendering, there shouldn't be any ambiguity in the interpretation of the text.
This is under Debian with poppler-utils 22.12.0-2.https://gitlab.freedesktop.org/poppler/poppler/-/issues/1419Request: Update TextOutputDev.cc (sync with xpdf)2023-08-19T06:54:03ZRafał MiłeckiRequest: Update TextOutputDev.cc (sync with xpdf)File `TextOutputDev.cc` is important for the `pdftotext` tool. Since the forking it received a lot of improvements:
https://gitlab.freedesktop.org/poppler/poppler/-/commits/master/poppler/TextOutputDev.cc
Original `TextOutputDev.cc` (...File `TextOutputDev.cc` is important for the `pdftotext` tool. Since the forking it received a lot of improvements:
https://gitlab.freedesktop.org/poppler/poppler/-/commits/master/poppler/TextOutputDev.cc
Original `TextOutputDev.cc` (in the Xpdf project) also received a lot of changes since the 3.00 release. It received support for splitting input into blocks and outputting using multiple layouts.
It seems both implementations diverged significantly and both gained some important features. Unfortunately there are some Xpdf-only features that poppler users may be missing. That includes support for `pdftotext` modes like `-simple`, `-simple2` and `-table`.
Is there any sane way of bringing Xpdf `TextOutputDev.cc` improvements into poppler?
It seems like an impossible task to port all poppler changes to the most recent Xpdf's `TextOutputDev.cc`. Too many of them.
There is no public Xpdf git repository so we can't port their commits to poppler one by one. Generating `diff` from release to release results in huge non-described bunch of changes.
Does anyone have any idea how/if this could be solved?https://gitlab.freedesktop.org/poppler/poppler/-/issues/1206pdftotext: combining accent moved to subsequent character2022-02-03T23:08:43ZZack Weinbergpdftotext: combining accent moved to subsequent characterThe attached PDF [roundtrip.pdf](/uploads/0e3174aee5a6085a1ee7bceab44558af/roundtrip.pdf) displays two grapheme clusters: **l̀a**. This is the combined rendering of three Unicode characters: U+006C, U+0300, U+0061, in that order.
pdftot...The attached PDF [roundtrip.pdf](/uploads/0e3174aee5a6085a1ee7bceab44558af/roundtrip.pdf) displays two grapheme clusters: **l̀a**. This is the combined rendering of three Unicode characters: U+006C, U+0300, U+0061, in that order.
pdftotext (self-identified as "version 22.01.0") misinterprets the U+0300 as being applied to the 'a', not the 'l', and emits **là** (U+006C U+0061 U+0300).
Looking at the page stream, I see
```
BT
/F27 9.96264 Tf
1 0 0 1 34.098 32.557 Tm [<004f>]TJ
1 0 0 1 38.96 34.789 Tm [<0b3e>]TJ
1 0 0 1 37.187 32.557 Tm [<0044>]TJ
ET
```
004f, 0b3e, 0044 are mapped to U+006C, U+0300, U+0061 respectively by the /ToUnicode object for the font. I'm guessing pdftotext is only looking at the relative x-positions of the characters to decide where the accent character is placed (38.96 > 37.17) and not where the visible glyph for the accent actually winds up. I have no idea why this particular font places the origin of this character so far to the right of the visible glyph, but TeX managed to figure out how to place the accent correctly, so I would like to think it would be possible for pdftotext to do the same.
In case it is helpful, the PDF was generated by rendering the following document with LuaTeX from TeX Live 2021:
```
\documentclass{minimal}
\usepackage[paperwidth=5cm,paperheight=2cm,margin=5mm]{geometry}
\usepackage{fontspec}
\setmainfont{Noto Serif}
\pagestyle{empty}
\begin{document}
{\`l}a
\end{document}
```https://gitlab.freedesktop.org/poppler/poppler/-/issues/1146Text extraction should expand ligatures to their normal form2024-03-04T15:39:21ZVincent LefevreText extraction should expand ligatures to their normal formThere was this [old bug on Bugzilla](https://bugs.freedesktop.org/show_bug.cgi?id=7002) saying "pdftotext and copy-n-paste from a document should expand ligatures such as fi to the letters f and i.", which was fixed in 2012 in commit 336...There was this [old bug on Bugzilla](https://bugs.freedesktop.org/show_bug.cgi?id=7002) saying "pdftotext and copy-n-paste from a document should expand ligatures such as fi to the letters f and i.", which was fixed in 2012 in commit 3361564364a1799fc3d6c6df9f208c5531c407dc.
But I can still see such ligatures generated by `pdftotext`, e.g. on the following PDF file (generated by Ghostscript's `ps2pdf`): [chartest3-gs.pdf](/uploads/8f0300421671f789524f500ad975a6b3/chartest3-gs.pdf)
In short, I get "Don’t ff." (with U+FB00 LATIN SMALL LIGATURE FF) instead of "Don’t ff." (with 2 letters "f").https://gitlab.freedesktop.org/poppler/poppler/-/issues/1145pdftotext: chars sequences "fi" or "fr" are rendered at the end of each page ...2021-10-09T11:44:25ZColin Dariepdftotext: chars sequences "fi" or "fr" are rendered at the end of each page for Pages pdfSome string sequences like `fi` or `fr` (less frequently `tt` or `ti`, maybe others ) are sometimes rendered at the end of the page. It's breaking completely the converted text.
I joined 2 simple PDF generated from Pages on OS X.
Extra...Some string sequences like `fi` or `fr` (less frequently `tt` or `ti`, maybe others ) are sometimes rendered at the end of the page. It's breaking completely the converted text.
I joined 2 simple PDF generated from Pages on OS X.
Extraction by `pdftotext file.pdf` produces this text :
```
Flatten capitalized it's o
l
a l
no problem with this sequence of char
s
k
d
n
e
fl
fi
it's attene
e
fl
fi
atte
^L
```
Expected output is :
```
flatten
Flatten capitalized it's ok
file
a file
no problem with this sequence of chars
it's flattened
```
The joined PDF files were generated by Pages on macOS, by both Export PDF and Print PDF features (there are not strictly the sames). I was not able to generate a buggy file with others editors (I'm still investigating if I find problematic files generated by other editors). It seems to be case sensitive.
(A minimal example can be reproduced with just the word `flat` on a file).
`pdftohtml` converts the files as expected.
My Pages version is 11.2 on OSX 11.6, but I have found user files with older Pages/OSX versions.
I was able to reproduce the issue at least on these poppler versions :
- 21.09, 21.07 on osx
- 0.86.1 on ubuntu
[pdf-pages-print-pdf.pdf](/uploads/68f1325e02d7e7c32cc0d1de2c39673a/pdf-pages-print-pdf.pdf)
[pdf-pages-export.pdf](/uploads/29a60fbf3dd7ddf2dca75774a692f1a9/pdf-pages-export.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1135pdftotext adds space after first letter with small caps.2021-09-04T09:15:39Zbbernickerpdftotext adds space after first letter with small caps.When parsing text that is typeset in small caps, pdftotext adds a space between a capitalized first letter and lowercase (small caps) subsequent letters.
[Example_Page.pdf](/uploads/97d722ed18bc7107399ff33465ef9747/Example_Page.pdf)When parsing text that is typeset in small caps, pdftotext adds a space between a capitalized first letter and lowercase (small caps) subsequent letters.
[Example_Page.pdf](/uploads/97d722ed18bc7107399ff33465ef9747/Example_Page.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/1044Too many spaces in the pdf extraction through pdftotext utility2021-02-11T23:01:32ZZubair Uddin FarooquiToo many spaces in the pdf extraction through pdftotext utilityGetting too many spaces between text when extracted from pdf using command-line utility `utils/pdftotext`
**Command used:**
`utils/pdftotext -layout ./source_file.pdf ./text_extraction.txt`
**Actual Extraction:**
`Numéro de sécurité ...Getting too many spaces between text when extracted from pdf using command-line utility `utils/pdftotext`
**Command used:**
`utils/pdftotext -layout ./source_file.pdf ./text_extraction.txt`
**Actual Extraction:**
`Numéro de sécurité sociale (NIR) 2 8 7 0 7 9 9 3 3 5 0 7 4 2 4`
**Expected:**
Minimum spaces between numbers, especially in between the first number "2" and the second number "8" which has more spaces in comparison with others.
**PDF File: **
[cerfa_15929-01.pdf](/uploads/f3a471dd92f37cd833d071b172afea62/cerfa_15929-01.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/746pdftotext output text position error2019-03-30T10:48:49Zidlepdftotext output text position errorenv:windows 10 1809
pdftotext version 0.75.0
in the pdf file text order such as `AB` use `pdftotext -enc UTF-8 -nopgbrk -layout lwarp-test_html.pdf lwarp-test_html.html` output `BA`
![20190328125806](/uploads/2b3e859dcec724509b8929e...env:windows 10 1809
pdftotext version 0.75.0
in the pdf file text order such as `AB` use `pdftotext -enc UTF-8 -nopgbrk -layout lwarp-test_html.pdf lwarp-test_html.html` output `BA`
![20190328125806](/uploads/2b3e859dcec724509b8929e82f7a7c1f/20190328125806.png)
![20190328130118](/uploads/a7689e7de10fcedf95a66b0d69673a8f/20190328130118.png)
the MWE
[lwarp-test_html.pdf](/uploads/74c2fac018a1a5acd375581ee1bbb3c0/lwarp-test_html.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/627pdftotext -bbox generates incorrect bounding box information2018-10-11T08:18:25ZBugzilla Migration Userpdftotext -bbox generates incorrect bounding box information## Submitted by wil..@..il.com
Assigned to **poppler-bugs**
**[Link to original bug (#69699)](https://bugs.freedesktop.org/show_bug.cgi?id=69699)**
## Description
Created attachment 86354
A sample pdf file
Please open the attachm...## Submitted by wil..@..il.com
Assigned to **poppler-bugs**
**[Link to original bug (#69699)](https://bugs.freedesktop.org/show_bug.cgi?id=69699)**
## Description
Created attachment 86354
A sample pdf file
Please open the attachment pdf and search for "15". The bounding box is not aligned with the vertical text. It inclines to the left side about 5 units.
**Attachment 86354**, "A sample pdf file":
[979835_044.pdf](/uploads/dbe3e7943bae56320b5bdd54bc6d9c5a/979835_044.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/534pdftotext - processing small pdf takes long time and creates cpu peaks2018-10-05T22:23:07ZBugzilla Migration Userpdftotext - processing small pdf takes long time and creates cpu peaks## Submitted by David Przybilla
Assigned to **poppler-bugs**
**[Link to original bug (#92724)](https://bugs.freedesktop.org/show_bug.cgi?id=92724)**
## Description
The following PDF is only 5 pages long, 4.8M.
However calling pdft...## Submitted by David Przybilla
Assigned to **poppler-bugs**
**[Link to original bug (#92724)](https://bugs.freedesktop.org/show_bug.cgi?id=92724)**
## Description
The following PDF is only 5 pages long, 4.8M.
However calling pdftotext on it takes approximately 3min and cpu goes to 100%.
Other Longer PDFs take only a few seconds and cpu usage is not mental, so this behaviour looks weird.
Here are some extra details:
pdftotext version 0.37.0 ( compiled from the latest stable release)
compiled with the following options:
/configure --disable-libopenjpeg --disable-poppler-qt4 --disable-gtk-test --disable-cairo-output --disable-splash-output --with-prefix=/usr/local
OS: Ubuntu 14.04 LTS, OSXhttps://gitlab.freedesktop.org/poppler/poppler/-/issues/332pdftotext: UTF-16 text without BOM not properly extracted2018-10-08T10:31:37ZBugzilla Migration Userpdftotext: UTF-16 text without BOM not properly extracted## Submitted by ral..@..te.com
Assigned to **poppler-bugs**
**[Link to original bug (#103309)](https://bugs.freedesktop.org/show_bug.cgi?id=103309)**
## Description
Created attachment 134881
Sample file
When I use pdftotext with ...## Submitted by ral..@..te.com
Assigned to **poppler-bugs**
**[Link to original bug (#103309)](https://bugs.freedesktop.org/show_bug.cgi?id=103309)**
## Description
Created attachment 134881
Sample file
When I use pdftotext with the attached sample file I get no usable text. When looking at the file with a hex editor, I can see that the text is available as UTF-16BE *without* BOM. The display with xpdf is fine.
Tested with version 0.48.0 (Debian Stable) and 0.57.0 (Debian Testing).
**Attachment 134881**, "Sample file":
[2004.pdf](/uploads/c656085b64342bbaa25e2cd65b820769/2004.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/322Some letters are in wrong order in the output of pdftotext2018-10-26T15:11:21ZBugzilla Migration UserSome letters are in wrong order in the output of pdftotext## Submitted by Bassem JARKAS
Assigned to **poppler-bugs**
**[Link to original bug (#32522)](https://bugs.freedesktop.org/show_bug.cgi?id=32522)**
## Description
I have a pdf file created in Adobe InDesign CS3 (5.0.4) with an embe...## Submitted by Bassem JARKAS
Assigned to **poppler-bugs**
**[Link to original bug (#32522)](https://bugs.freedesktop.org/show_bug.cgi?id=32522)**
## Description
I have a pdf file created in Adobe InDesign CS3 (5.0.4) with an embedded Arabic font called AXtManal, this font was created to work around the limitation of publishing softwares of creating Arabic documents.
pdftotext v0.15.3 (and older versions) renders some letters in wrong order, for eample: the word "abcd" appears "acbd", and this error repeated with many groups of letters, like "l" and "a", "m" and "j", "r" and "y" ..etc
Evince displayed the file correctly with the correct order and the correct layout. the problem is only in the extracting.
Any idea how to fix that?
you can find the pdf sample here: https://sites.google.com/site/jarkas/Home/049.pdf?attredirects=0&d=1
and the text output: https://sites.google.com/site/jarkas/Home/049_0.15.3.txt?attredirects=0&d=1
Best Regardshttps://gitlab.freedesktop.org/poppler/poppler/-/issues/264pdftotext incorrectly converts text with a large initial capital and required...2018-10-08T10:41:36ZBugzilla Migration Userpdftotext incorrectly converts text with a large initial capital and required hyphen## Submitted by Nash
Assigned to **poppler-bugs**
**[Link to original bug (#34500)](https://bugs.freedesktop.org/show_bug.cgi?id=34500)**
## Description
Created attachment 43574
Text example with a large initial capital and requir...## Submitted by Nash
Assigned to **poppler-bugs**
**[Link to original bug (#34500)](https://bugs.freedesktop.org/show_bug.cgi?id=34500)**
## Description
Created attachment 43574
Text example with a large initial capital and required hyphen
Actual Results:
1) large initial capital (1st paragraph)
А
ктуальність та постановка проблеми. Співоцька...
2) required hyphen (4th paragraph)
...окресленої проблематики торкалися пун
ктирно, їхні дослідження...
Expected Results:
1)
Актуальність та постановка проблеми. Співоцька...
2)
...окресленої проблематики торкалися пунктирно, їхні дослідження...
**Attachment 43574**, "Text example with a large initial capital and required hyphen":
[25.pdf](/uploads/6eeced98a510f74d966cf8fd6b7de807/25.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/254High CPU usage on reading specific file2018-10-11T20:34:54ZBugzilla Migration UserHigh CPU usage on reading specific file## Submitted by Alexander Hunziker
Assigned to **poppler-bugs**
**[Link to original bug (#54746)](https://bugs.freedesktop.org/show_bug.cgi?id=54746)**
## Description
A specific PDF file (http://ubuntuone.com/4ELfHGFXVtDAtU0lWsLT6...## Submitted by Alexander Hunziker
Assigned to **poppler-bugs**
**[Link to original bug (#54746)](https://bugs.freedesktop.org/show_bug.cgi?id=54746)**
## Description
A specific PDF file (http://ubuntuone.com/4ELfHGFXVtDAtU0lWsLT6G) causes very high CPU load and takes a long time to render using evince/cairo. On my dual core 2 GHz machine it takes 30 seconds to show the first page alone.
This was discovered by tracker choking when indexing this file (see https://bugzilla.gnome.org/show_bug.cgi?id=680897 comments 14 ff).
The built-in PDF reader of the chromium browser has no issues at all showing this file.https://gitlab.freedesktop.org/poppler/poppler/-/issues/252pdftotext breaks sentence in middle of sentence when text overflow the box, w...2018-10-08T10:46:12ZBugzilla Migration Userpdftotext breaks sentence in middle of sentence when text overflow the box, whereas pdftohtml captures the full sentence.## Submitted by Gaurav Arora
Assigned to **poppler-bugs**
**[Link to original bug (#99824)](https://bugs.freedesktop.org/show_bug.cgi?id=99824)**
## Description
Created attachment 129623
sample pdf which is facing this issue
Whil...## Submitted by Gaurav Arora
Assigned to **poppler-bugs**
**[Link to original bug (#99824)](https://bugs.freedesktop.org/show_bug.cgi?id=99824)**
## Description
Created attachment 129623
sample pdf which is facing this issue
While analyzing some specific set of files, we realized that lines generated by pdftohtml and pdftotext is different where text overflows the line boundary of box.
In case of pdftohtml the line is captured normally with full text of that line in a single text element. Whereas in case of pdftotext line is broken in middle of word and the rest of line is added as a separate line.
Explanation with example below:
Line as appear in pdftohtml output:
`<text top="412" left="79" width="1021" height="17" font="0">`To JOSEPH E. BLUTH for research and development in the field of electronic photography and transfer of video tape to motion picture film. [Laboratory]`</text>`
Line as appear in pdftotext
To JOSEPH E. BLUTH for research and development in the field of electronic photography and transfer of video tape to mo
.
.
.
.
.
sfer of video tape to motion picture film. [Laboratory]
Line as it appear in pdf file:
http://i67.tinypic.com/i6w66e.png
Even though pdf file doesn't show this line correctly. pdftohtml is correctly able to get the full line, hence pdftotext can also handle and get the full line.
It seems weird for line to be broken like this. I have attached a sample pdf file which shows this bug. I have tested the the file with poppler-0.51.0
/poppler/tmp/poppler-0.51.0$ /usr/local/bin/pdftotext -v
pdftotext version 0.51.0
Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
**Attachment 129623**, "sample pdf which is facing this issue":
[academy_awards_07-17-13.pdf](/uploads/6cd31e04e04e47ac5b543d70d2d2afee/academy_awards_07-17-13.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/194pdftotext converts all non-breaking spaces U+A0 and U+202F into U+202018-10-08T10:46:04ZBugzilla Migration Userpdftotext converts all non-breaking spaces U+A0 and U+202F into U+20## Submitted by Daniel Flipo
Assigned to **poppler-bugs**
**[Link to original bug (#102651)](https://bugs.freedesktop.org/show_bug.cgi?id=102651)**
## Description
Created attachment 134154
PDF file with non-breaking spaces to be p...## Submitted by Daniel Flipo
Assigned to **poppler-bugs**
**[Link to original bug (#102651)](https://bugs.freedesktop.org/show_bug.cgi?id=102651)**
## Description
Created attachment 134154
PDF file with non-breaking spaces to be preserved
Correction of bug #97399 lead to add non-breaking spaces U+A0 and U+202F to function UnicodeIsWhitespace which holds the list of all spaces used to break lines into words.
As a result, these non-breaking spaces are converted into breakable U+20 spaces by pdftotext. In some cases (ties like Mr Bean, high punctuation in French, etc.) these non-breaking spaces are intentionally added and should be preserved as such in the text or html output.
An option to pdftotext enabling to remove these two spaces from UnicodeIsWhitespace would solve the issue.
I append a a small PDF file with those non-breaking spaces for testing.
**Attachment 134154**, "PDF file with non-breaking spaces to be preserved":
[spaces.pdf](/uploads/e99a3c07889fd414d7ae7b3cefe2ea66/spaces.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/191pdftotext should filter control characters like "form feed"2023-11-09T09:23:03ZBugzilla Migration Userpdftotext should filter control characters like "form feed"## Submitted by Mike Gerber
Assigned to **poppler-bugs**
**[Link to original bug (#99506)](https://bugs.freedesktop.org/show_bug.cgi?id=99506)**
## Description
Created attachment 129108
Example PDF
Currently, pdftotext/TextOutput...## Submitted by Mike Gerber
Assigned to **poppler-bugs**
**[Link to original bug (#99506)](https://bugs.freedesktop.org/show_bug.cgi?id=99506)**
## Description
Created attachment 129108
Example PDF
Currently, pdftotext/TextOutputDev extracts control characters like form feeds from the PDF. These should be filtered, as the users expects form feeds to be inserted by pdftotext alone.
In the attached PDF, there is a form feed character (0xC) extracted between the word "sich" and the following formula. The form feed is - AFAICT - actually a character from the CMSY10 font.
**Attachment 129108**, "Example PDF":
[text7-page11-uncompressed.pdf](/uploads/8d67975978aa46efa5e6f2a6503ff3bf/text7-page11-uncompressed.pdf)https://gitlab.freedesktop.org/poppler/poppler/-/issues/176Resolution not take into account in pdftotext for Page node2018-10-11T20:36:26ZBugzilla Migration UserResolution not take into account in pdftotext for Page node## Submitted by mr...@..il.com
Assigned to **poppler-bugs**
**[Link to original bug (#103573)](https://bugs.freedesktop.org/show_bug.cgi?id=103573)**
## Description
When pdftotext generates XML, DPI is taken into account for (x/y)...## Submitted by mr...@..il.com
Assigned to **poppler-bugs**
**[Link to original bug (#103573)](https://bugs.freedesktop.org/show_bug.cgi?id=103573)**
## Description
When pdftotext generates XML, DPI is taken into account for (x/y)(Min/Max) attributes on all nodes.
However, width and height on Page node remain unaffected by providing different DPI.
Example usage showing the problem:
pdftotext -r 72 -bbox something.pdf -
pdftotext -r 150 -bbox something.pdf -