Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
P
poppler
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 612
    • Issues 612
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 43
    • Merge Requests 43
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • poppler
  • poppler
  • Issues
  • #600

Closed
Open
Created Feb 21, 2017 by Bugzilla Migration User@bugzilla-migration

pdfimages extracts lots of same images with the same object number.

Submitted by 石印

Assigned to poppler-bugs

Link to original bug (#99883)

Description

Created attachment 129787 problem file

I have a pdf file, pdfimages list a lot of images with the object number. These images are the same. There are only about a thousand pictures with diffrent object number, but pdfimages list more than 256,000 items. Finally, pdfimages extract all pictures listed and most of them are the same. The total size of all pictures is really huge. I upload the pdf, and my simple patch below ( may not good, but work :D ).

From 237f4e0887eff2f22d5542dfed33fa94a8c7b0ff Mon Sep 17 00:00:00 2001 From: Ryan ryanorz@126.com Date: Tue, 21 Feb 2017 16:11:53 +0800 Subject: [PATCH] Fix(poppler-utils): pdfimages extract too many same pictures with the same object number.


utils/ImageOutputDev.cc | 8 ++++++++ utils/ImageOutputDev.h | 2 ++ 2 files changed, 10 insertions(+)

diff --git a/utils/ImageOutputDev.cc b/utils/ImageOutputDev.cc index 5de51ad..26bf95b 100644 --- a/utils/ImageOutputDev.cc +++ b/utils/ImageOutputDev.cc @@ -442,6 +442,14 @@ void ImageOutputDev::writeImageFile(ImgWriter *writer, ImageFormat format, const void ImageOutputDev::writeImage(GfxState *state, Object *ref, Stream *str, int width, int height, GfxImageColorMap *colorMap, GBool inlineImg) {

  • if (ref->isRef()) {

  • const Ref imageRef = ref->getRef();

  • if (refNums.find(imageRef.num) != refNums.end())

  •  return;
  • else

  •  refNums.insert(imageRef.num);
  • }

  • ImageFormat format;

    if (dumpJPEG && str->getKind() == strDCT && diff --git a/utils/ImageOutputDev.h b/utils/ImageOutputDev.h index a694bbc..89c67ac 100644 --- a/utils/ImageOutputDev.h +++ b/utils/ImageOutputDev.h @@ -35,6 +35,7 @@ #endif

#include <stdio.h> +#include <set> #include "goo/gtypes.h" #include "goo/ImgWriter.h" #include "OutputDev.h" @@ -173,6 +174,7 @@ private: int pageNum; // current page number int imgNum; // current image number GBool ok; // set up ok?

  • std::set<int> refNums; };

#endif

2.10.2

Attachment 129787, "problem file":
Linuxå__æ__å__å__æ__é___ä__æ__ç__v3.0_.pdf

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None