Emit more font information when pdftohtml is run with -xml
Submitted by ulatekh
Assigned to poppler-bugs
Link to original bug (#107318)
Description
Created attachment 140750 Patch to add functionality
I'm about to use pdftohtml to extract information from PDFs and organize the results into a database, so I had a chance to dig through the code.
The patch merely emits more information in the <fontspec>
elements when pdftohtml is run with -xml. The PDFs I'm trying to analyze appear to be pretty consistent with their font usage, to the point where I can use them to infer the text's meaning. But I needed more information in the <fontspec>
to do that, and this patch does that for me.
Patch 140750, "Patch to add functionality":
0003-Emit-more-font-information-when-pdftohtml-is-run-wit.patch