Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P poppler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 655
    • Issues 655
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 43
    • Merge requests 43
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #1044

Closed
Open
Created Feb 11, 2021 by Zubair Uddin Farooqui@zubair.farooqui

Too many spaces in the pdf extraction through pdftotext utility

Getting too many spaces between text when extracted from pdf using command-line utility utils/pdftotext

Command used: utils/pdftotext -layout ./source_file.pdf ./text_extraction.txt

Actual Extraction: Numéro de sécurité sociale (NIR) 2 8 7 0 7 9 9 3 3 5 0 7 4 2 4

Expected: Minimum spaces between numbers, especially in between the first number "2" and the second number "8" which has more spaces in comparison with others.

**PDF File: ** cerfa_15929-01.pdf

Edited Feb 11, 2021 by Zubair Uddin Farooqui
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking