Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
P
poppler
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 611
    • Issues 611
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 39
    • Merge Requests 39
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #556

Closed
Open
Opened Oct 18, 2016 by Bugzilla Migration User@bugzilla-migration

-xml outputs malformed xml

Submitted by dan..@..il.com

Assigned to poppler-bugs

Link to original bug (#98305)

Description

Overview:

The following pdf causes pdftohtml to output malformed xml:
http://www.atmel.com/images/Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.pdf 
The resulting xml file has multiple similar errors, the first one on line 71641:
`<text top="180" left="71" width="101" height="15" font="11">``<b>`Sp<a href="Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.html#876">eed [MHz] `</b>`(3)`</a>``</text>`
(the closing b and a tags are not in the correct order)

Steps to Reproduce:

1) wget http://www.atmel.com/images/Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.pdf 

2) pdftohtml -q -i -xml Atmel-8284-8-bit-AVR-microcontroller-ATmega169A_PA_329A_PA_3290A_PA_649A_P_6490A_P_datasheet.pdf output.xml

Actual Results:

malformed xml

Expected Results:

well-formed xml. And I'm not quite sure if the link is placed on the correct piece of text. In the pdf only the text "(3)" is clickable and none of it is bold.

Build Date & Hardware:

Built on 2016-10-18 from source (0.48.0) on Ubunty 14.04 LTS

Additional Builds and Platforms:

Also occurred in the version of pdftohtml that was installed using apt-get (0.28 if I recall correctly)

Cheers,

Daniel

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: poppler/poppler#556