Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P poppler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 664
    • Issues 664
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 46
    • Merge requests 46
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #892
Closed
Open
Issue created Mar 09, 2020 by maklor78@maklor78

[pdftotext] Unicode Error: Unicode character ̈(U+0308)

When using pdftotext on a pdf document including german "Umlaute" like ä, ü, ö or characters like ß the resulting text file will not include the correct unicode characters but instead ü is made to u with and additional ̈ above it.

This will create issues like with latex editors. https://tex.stackexchange.com/questions/4268/inputenc-error-unicode-char-u8-error-while-trying-to-write-a-degree-symbol

I found that using calibre to convert the same pdf document to text results in the correct characters, therefore this seems to be a bug in pdftotext

Edited Mar 09, 2020 by maklor78
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking