Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
P
poppler
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 613
    • Issues 613
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 40
    • Merge Requests 40
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • poppler
  • poppler
  • Issues
  • #763

Closed
Open
Opened May 04, 2019 by Coyoteazul@Coyoteazul

find_text and get_text_for_area use a different 0 point on y axis

I was trying to find the coordinates of a certain text on a document, and then bring the text on an area that was close to it. However I found that apparently the 2 functions use different criteria for where the cero point is on the y axis.

main.cpp

#include <iostream>
#include "poppler.h"
#include <string>

int main (int argc, char *argv[]){
	PopplerDocument *doc = poppler_document_new_from_file(argv[1], NULL, NULL);
	PopplerPage *pag1 = poppler_document_get_page(doc, 0);

	GList *lista = poppler_page_find_text(pag1, "WUNMUN-2018-0039");

	PopplerRectangle *rect = (PopplerRectangle *)lista->data;

	std::cout << "rect : \tx1: " << rect->x1 << " \ty1: " <<rect->y1 << " \tx2: " << rect->x2 << " \ty2: " <<rect->y2 << std::endl;
	std::string hallado = poppler_page_get_text_for_area(pag1, rect);
	std::cout << "text on rect1: " << hallado << std::endl<< std::endl;

	double izq, arriba;
	poppler_page_get_size(pag1, &izq, &arriba);
	PopplerRectangle *rect2 = poppler_rectangle_copy(rect);
	rect2->y1 = arriba - rect2->y1;
	rect2->y2 = arriba - rect2->y2;

	std::cout << "rect2 : \tx1: " << rect2->x1 << " \ty1: " <<rect2->y1 << " \tx2: " << rect2->x2 << " \ty2: " <<rect2->y2 << std::endl;
	std::string hallado2 = poppler_page_get_text_for_area(pag1, rect2);
	std::cout << "text on rect2: " << hallado2 << std::endl<< std::endl;

    return 0;
} 

terminal output

$ ./pruebas file:///home/hernan/Documentos/pruebas/bin/Debug/Leiva.pdf
rect1 : 	x1: 84.128 	y1: 549.304 	x2: 162.36 	y2: 556.704
text on rect1: ante Autorizado
1,00
U. Medida
unidades
Pág. 1/1
Esta Administración Fe 

rect2 : 	x1: 84.128 	y1: 292.696 	x2: 162.36 	y2: 285.296
text on rect2: WUNMUN-2018-0039

The text on rect1 is actually on another area of the document, much lower than WUNMUN. poppler_page_get_text_for_area seems to assume that the cero point for the y axis is on the top, while poppler_page_find_text returns coordinates where the cero point for the y axis is on the bottom (as it should be)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: poppler/poppler#763