Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P poppler
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 657
    • Issues 657
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 39
    • Merge requests 39
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #763
Closed
Open
Created May 04, 2019 by Coyoteazul@Coyoteazul

find_text and get_text_for_area use a different 0 point on y axis

I was trying to find the coordinates of a certain text on a document, and then bring the text on an area that was close to it. However I found that apparently the 2 functions use different criteria for where the cero point is on the y axis.

main.cpp

#include <iostream>
#include "poppler.h"
#include <string>

int main (int argc, char *argv[]){
	PopplerDocument *doc = poppler_document_new_from_file(argv[1], NULL, NULL);
	PopplerPage *pag1 = poppler_document_get_page(doc, 0);

	GList *lista = poppler_page_find_text(pag1, "WUNMUN-2018-0039");

	PopplerRectangle *rect = (PopplerRectangle *)lista->data;

	std::cout << "rect : \tx1: " << rect->x1 << " \ty1: " <<rect->y1 << " \tx2: " << rect->x2 << " \ty2: " <<rect->y2 << std::endl;
	std::string hallado = poppler_page_get_text_for_area(pag1, rect);
	std::cout << "text on rect1: " << hallado << std::endl<< std::endl;

	double izq, arriba;
	poppler_page_get_size(pag1, &izq, &arriba);
	PopplerRectangle *rect2 = poppler_rectangle_copy(rect);
	rect2->y1 = arriba - rect2->y1;
	rect2->y2 = arriba - rect2->y2;

	std::cout << "rect2 : \tx1: " << rect2->x1 << " \ty1: " <<rect2->y1 << " \tx2: " << rect2->x2 << " \ty2: " <<rect2->y2 << std::endl;
	std::string hallado2 = poppler_page_get_text_for_area(pag1, rect2);
	std::cout << "text on rect2: " << hallado2 << std::endl<< std::endl;

    return 0;
} 

terminal output

$ ./pruebas file:///home/hernan/Documentos/pruebas/bin/Debug/Leiva.pdf
rect1 : 	x1: 84.128 	y1: 549.304 	x2: 162.36 	y2: 556.704
text on rect1: ante Autorizado
1,00
U. Medida
unidades
Pág. 1/1
Esta Administración Fe 

rect2 : 	x1: 84.128 	y1: 292.696 	x2: 162.36 	y2: 285.296
text on rect2: WUNMUN-2018-0039

The text on rect1 is actually on another area of the document, much lower than WUNMUN. poppler_page_get_text_for_area seems to assume that the cero point for the y axis is on the top, while poppler_page_find_text returns coordinates where the cero point for the y axis is on the bottom (as it should be)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking