find_text and get_text_for_area use a different 0 point on y axis
I was trying to find the coordinates of a certain text on a document, and then bring the text on an area that was close to it. However I found that apparently the 2 functions use different criteria for where the cero point is on the y axis.
main.cpp
#include <iostream>
#include "poppler.h"
#include <string>
int main (int argc, char *argv[]){
PopplerDocument *doc = poppler_document_new_from_file(argv[1], NULL, NULL);
PopplerPage *pag1 = poppler_document_get_page(doc, 0);
GList *lista = poppler_page_find_text(pag1, "WUNMUN-2018-0039");
PopplerRectangle *rect = (PopplerRectangle *)lista->data;
std::cout << "rect : \tx1: " << rect->x1 << " \ty1: " <<rect->y1 << " \tx2: " << rect->x2 << " \ty2: " <<rect->y2 << std::endl;
std::string hallado = poppler_page_get_text_for_area(pag1, rect);
std::cout << "text on rect1: " << hallado << std::endl<< std::endl;
double izq, arriba;
poppler_page_get_size(pag1, &izq, &arriba);
PopplerRectangle *rect2 = poppler_rectangle_copy(rect);
rect2->y1 = arriba - rect2->y1;
rect2->y2 = arriba - rect2->y2;
std::cout << "rect2 : \tx1: " << rect2->x1 << " \ty1: " <<rect2->y1 << " \tx2: " << rect2->x2 << " \ty2: " <<rect2->y2 << std::endl;
std::string hallado2 = poppler_page_get_text_for_area(pag1, rect2);
std::cout << "text on rect2: " << hallado2 << std::endl<< std::endl;
return 0;
}
terminal output
$ ./pruebas file:///home/hernan/Documentos/pruebas/bin/Debug/Leiva.pdf
rect1 : x1: 84.128 y1: 549.304 x2: 162.36 y2: 556.704
text on rect1: ante Autorizado
1,00
U. Medida
unidades
Pág. 1/1
Esta Administración Fe
rect2 : x1: 84.128 y1: 292.696 x2: 162.36 y2: 285.296
text on rect2: WUNMUN-2018-0039
The text on rect1 is actually on another area of the document, much lower than WUNMUN. poppler_page_get_text_for_area seems to assume that the cero point for the y axis is on the top, while poppler_page_find_text returns coordinates where the cero point for the y axis is on the bottom (as it should be)