Skip to content
  • Nelson Benítez León's avatar
    find, glib: Enhance find to support multi-line matching · d2618832
    Nelson Benítez León authored and Christian Persch's avatar Christian Persch committed
    On the backend side, add adding 5 new parameters to TextPage::findText(),
    4 to return coords for the part of the match that falls on the next line,
    and 1 to specify whether hyphen was ignored at end of the first line.
    
    For the glib binding, this extends the public PopplerRectangle struct
    by new members to hold additional information about whether the rectangle
    belongs to a group of rectangles for the same match, and whether a hyphen
    was ignored at the end of the line. Since PopplerRectangle is public
    ABI, this is done by making the public PopplerRectangle API return the
    enlarged struct, and internally casting to the new struct when required;
    the new members are accessible only via accessor functions.
    
    For Qt5 binding, this commit only implements the new flag
    Poppler::Page::AcrossLines (but no new function and no new
    return data type) and if this flag is passed, the returned
    list of rectangles will also include rectangles for the
    second part of across-line matches.
    
    This minimum qt5 binding still allows for the creation of
    tests for this feature (using the qt5 test framework) which
    this commit *do includes*. But a more complete binding (with
    a new return type that includes 'next_line' and 'after_hyphen'
    boolean fields) is left to do for qt5 binding maintainers
    if they want to use this feature (in eg. Okular).
    
    So, as mentioned, this commit incorporates tests for the
    implemented across-line matching feature, and the tests do
    also check for two included aspects of this feature, which are:
    
     - Ignoring hyphen character while matching when 1) it's the
       last character of the line and 2) its corresponding matching
       character in the search term is not an hyphen too.
    
     - Any whitespace characters in the search term will be allowed
       to match on the logic position where the lines split (i.e. what
       would normally be the newline character in a text file, but
       PDF text does not include newline characters between lines).
    
    Regarding the enhancement to findText() function which implements
    matching across lines, just two more notes:
    
     - It won't match on text spanning more than two lines, i.e. it
       only matches text spanning from end of one line to start of
       next line.
    
     - It does not supports finding backwards, if findText() receives
       both <backward> and <matchAcrossLines> parameters as true, it
       will ignore the <matchAcrossLines> parameter. Implementing
       <matchAcrossLines> with backwards direction is possible, but
       it will make an already complex function like findText() to be
       even more complex, for little gain as eg. Evince does not even
       use the <backward> parameter of findText().
    
    Fixes poppler issues #744 and #755
    Related Evince issue https://gitlab.gnome.org/GNOME/evince/issues/333
    d2618832