Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P poppler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 665
    • Issues 665
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 46
    • Merge requests 46
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #385
Closed
Open
Issue created Dec 12, 2017 by Bugzilla Migration User@bugzilla-migration

When extracting as XML all new lines are stripped

Submitted by cla..@..eat.dk

Assigned to poppler-bugs

Link to original bug (#104230)

Description

Created attachment 136123 test pdf

pdftohtml -s -i -xml test.pdf out.xml

VS

pdftohtml -s -i test.pdf out.html

When you extract the text as HTML alle new lines are kept, but if you extract the text as XML they are stripped out and each new line is put in a new tag

Attachment 136123, "test pdf":
001.pdf

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking