Extra space after a bold-faced letter when retrieving text
While trying to detect labels of figures in ebooks, I stumbled upon some odd bug.
If a word(label for images/figures/... in my use-case) contains bold-faced letters, then the text retrieved(either through text selection in Evince or the Poppler API(GLib)) would contain an extra space just after the bold-faced letter. An example:
The rendered text: 'Figure 5.54: bla bla bla'
The retrieved text: 'F igure 5.54: bla bla bla', notice the extra
after F
I should also note that if one performs a search for the 'F igure 5.54: bla bla bla', she would not find anything. So, it seems that this bug has something to do with the way bold-faced stuff are retrieved.
I've also attached the pdf file that caused this bug. Open the file with Evince/API and look for labels below figures(e.g. pages 18, 29, 32, 43, 46, 48, 49, 51, ...).
Evince:
- Version: 3.28.4
Poppler(GLib):
- Architecture: amd64
- Version: 0.62.0-2ubuntu2.12
Please note that I have no idea if this is universal or not as I do not have extra pdf files to test this.