Skip to content
  • Nelson Benítez León's avatar
    find, glib: Enhance find to support multi-line matching · e3fed321
    Nelson Benítez León authored
    On the backend side, adds 3 new parameters to TextPage::findText(),
    one bool to enable the feature, one out PDFRectangle to store
    the part of the match that falls on the next line, and one out
    bool to inform whether hyphen was present and ignored at end of
    the previous match part.
    
    For the glib binding, this extends the public PopplerRectangle
    struct by new members to hold additional information about
    whether the rectangle belongs to a group of rectangles for the
    same match, and whether a hyphen was ignored at the end of the
    line. Since PopplerRectangle is public ABI, this is done by making
    the public PopplerRectangle API return the enlarged struct, and
    internally casting to the new struct when required, the new
    members are accessible only via accessor functions.
    
    For Qt5 Qt6 bindings, this commit only implements the new flag
    Poppler::Page::AcrossLines (but no new function and no new
    return data type) and if this flag is passed, the returned
    list of rectangles will also include rectangles for the
    second part of across-line matches.
    
    This minimum Qt bindings still allows for the creation of
    tests for this feature (using the Qt test framework) which
    this commit *do includes*. But a more complete binding (with
    a new return type that includes 'matchContinued' and 'ignoredHypen'
    boolean fields) is left to do for qt backend maintainers
    if they want to use this feature in eg. Okular.
    
    So, as mentioned, this commit incorporates tests for the
    implemented across-line matching feature, and the tests do
    also check for two included aspects of this feature, which are:
    
     - Ignoring hyphen character while matching when 1) it's the
       last character of the line and 2) its corresponding matching
       character in the search term is not an hyphen too.
    
     - Any whitespace characters in the search term will be allowed
       to match on the logic position where the lines split (i.e. what
       would normally be the newline character in a text file, but
       PDF text does not include newline characters between lines).
    
    Regarding the enhancement to findText() function which implements
    matching across lines, just two more notes:
    
     - It won't match on text spanning more than two lines, i.e. it
       only matches text spanning from end of one line to start of
       next line.
    
     - It does not supports finding backwards, if findText() receives
       both <backward> and <matchAcrossLines> parameters as true, it
       will ignore the <matchAcrossLines> parameter. Implementing
       <matchAcrossLines> with backwards direction is possible, but
       it will make an already complex function like findText() to be
       even more complex, for little gain as eg. Evince does not even
       use the <backward> parameter of findText().
    
    Fixes poppler issues #744 and #755
    Related Evince issue https://gitlab.gnome.org/GNOME/evince/issues/333
    e3fed321