Skip to content

find, glib: Enhance find to support multi-line matching

On the backend side, adds 3 new parameters to TextPage::findText(), one bool to enable the feature, one out PDFRectangle to store the part of the match that falls on the next line, and one out bool to inform whether hyphen was present and ignored at end of the previous match part.

For the glib binding, this extends the public PopplerRectangle struct by new members to hold additional information about whether the rectangle belongs to a group of rectangles for the same match, and whether a hyphen was ignored at the end of the line. Since PopplerRectangle is public ABI, this is done by making the public PopplerRectangle API return the enlarged struct, and internally casting to the new struct when required, the new members are accessible only via accessor functions.

For Qt5 Qt6 bindings, this commit only implements the new flag Poppler::Page::AcrossLines (but no new function and no new return data type) and if this flag is passed, the returned list of rectangles will also include rectangles for the second part of across-line matches.

This minimum Qt bindings still allows for the creation of tests for this feature (using the Qt test framework) which this commit do includes. But a more complete binding (with a new return type that includes matchContinued and ignoredHypen boolean fields) is left to do for Qt backend maintainers if they want to use this feature in eg. Okular.

So, as mentioned, this commit incorporates tests for the implemented across-line matching feature, and the tests do also check for two included aspects of this feature, which are:

  • Ignoring hyphen character while matching when 1) it's the last character of the line and 2) its corresponding matching character in the search term is not an hyphen too.

  • Any whitespace characters in the search term will be allowed to match on the logic position where the lines split (i.e. what would normally be the newline character in a text file, but PDF text does not include newline characters between lines).

Regarding the enhancement to findText() function which implements matching across lines, just two more notes:

  • It won't match on text spanning more than two lines, i.e. it only matches text spanning from end of one line to start of next line.

  • It does not supports finding backwards, if findText() receives both and parameters as true, it will ignore the parameter. Implementing with backwards direction is possible, but it will make an already complex function like findText() to be even more complex, for little gain as eg. Evince does not even use the parameter of findText().

Fixes poppler issues #744 and #755 (closed)

Related Evince issue https://gitlab.gnome.org/GNOME/evince/issues/333

Edited by Nelson Benítez León

Merge request reports