Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
P
poppler
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 613
    • Issues 613
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 39
    • Merge Requests 39
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #637

Closed
Open
Opened Sep 10, 2018 by Nelson Benítez León@nbenitezGuest

Match accented chars in ::findText()

When !caseSensitive is passed and the search term is pure Ascii.

This makes possible that simple ascii search terms can match on their accented and other diacritics counterparts. Examples:

  • "arbol" matches "árbol" (spanish)
  • "resume" matches "résumé" (french)
  • "Ausgleichslosung" matches "Ausgleichslösung" (german)

This may cause some false positives when partial matching, like:

  • "ana" matches on "gañan", where in spanish "n" and "ñ" are different letters.
  • (Other languages would have similar cases).

IMO these false positives are acceptable, a small side effect of making the search more lax so accented text is found easier. The user only needs to mark caseSensitive to make the search strict to their terms.

In the merge request I've implemented this as an automatic behaviour when !caseSensitive is passed and the search term is pure Ascii (first 128 chars, so just letters without any diacritic), but if you prefer API consumers to be explicit about wanting this, we could add it under a new diacriticSensitive parameter (as was mentioned in https://bugzilla.freedesktop.org/show_bug.cgi?id=2929#c16 ).

Downstream bug in Evince: https://gitlab.gnome.org/GNOME/evince/issues/58

Edited Sep 10, 2018 by Nelson Benítez León
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: poppler/poppler#637