glib: mismatch between find text results coordinates and their corresponding utf8 characters in text
The bug happens in many pdf's (but not all, depends on the text), I could even reproduce it with one of the poppler test pdf files. See below.
How to reproduce:
- Open searchAcrossLines.pdf in Evince.
- Search for "cubo" text, one single match should appear on the first page.
- Notice how the matched text shown in bold in the sidebar is wrong, i.e. instead of a bold cubo (which is the matched text) it shows bo M in bold.
The problem is that Evince, having the graphical coordinates of the matched text, is unable to correctly locate it in the text from poppler_page_get_text()
.
I could tracked the bug to be caused by my commit d6cccfb8, and the reason for that is because that logic change (respecting spaceAfter
property) in the TextSelectionDumper::getText()
code, must also be mimicked in the similar code of poppler_page_get_text_layout_for_area()
and poppler_page_get_text_attributes_for_area()
functions, because those are used together by Evince to map between graphical coordinates of text and their corresponding position in the utf8 text from poppler_page_get_text()
.
So, I'm sending a MR with the mentioned fix and with a glib test that catches this bug.