Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
P
poppler
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 613
    • Issues 613
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 39
    • Merge Requests 39
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #863

Closed
Open
Opened Jan 02, 2020 by Saduk@Saduk

Bug in PdfToHtml

There is a bug in PdfToHtml (version 0.83)

When I execute

pdftohtml -xml input.pdf output.xml

all is fine. The output looks like this:

....
<text top="363" left="81" width="413" height="61" font="1">Bla Bla</text>
<text top="422" left="81" width="514" height="40" font="2">Bla Bla</text>
<text top="1131" left="765" width="72" height="16" font="3">Bla Bla</text>
</page>
<page number="2" position="absolute" top="0" left="0" height="1188" width="918">
	<fontspec id="4" size="8" family="PalatinoLinotype" color="#ffffff"/>
	<fontspec id="5" size="11" family="PalatinoLinotype" color="#000000"/>
	<fontspec id="6" size="11" family="Arial" color="#000000"/>
	<fontspec id="7" size="6" family="TimesNewRomanPSMT" color="#000000"/>
	<fontspec id="8" size="9" family="TimesNewRomanPSMT" color="#000000"/>
<text top="104" left="81" width="5" height="14" font="4">Bla Bla</text>
<text top="144" left="81" width="33" height="18" font="5">Bla Bla</text>
.....

But when I execute the same with the additional parameter -stdout the XML code is mixed up with "link to page xyz":

....
<text top="363" left="81" width="413" height="61" font="1">Bla Bla</text>
<text top="422" left="81" width="514" height="40" font="2">Bla Bla</text>
<text top="1131" left="765" width="72" height="16" font="3">Bla Bla</text>
</page>
link to page 7  link to page 9  link to page 11  link to page 13  link to page 13 
<page number="2" position="absolute" top="0" left="0" height="1188" width="918">
	<fontspec id="4" size="8" family="PalatinoLinotype" color="#ffffff"/>
	<fontspec id="5" size="11" family="PalatinoLinotype" color="#000000"/>
	<fontspec id="6" size="11" family="Arial" color="#000000"/>
	<fontspec id="7" size="6" family="TimesNewRomanPSMT" color="#000000"/>
	<fontspec id="8" size="9" family="TimesNewRomanPSMT" color="#000000"/>
<text top="104" left="81" width="5" height="14" font="4">Bla Bla</text>
<text top="144" left="81" width="33" height="18" font="5">Bla Bla</text>
.....
Edited Jan 02, 2020 by Saduk
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: poppler/poppler#863