Emit more font information when pdftohtml is run with -xml
Submitted by ulatekh
Assigned to poppler-bugs
Created attachment 140750 Patch to add functionality
I'm about to use pdftohtml to extract information from PDFs and organize the results into a database, so I had a chance to dig through the code.
The patch merely emits more information in the
<fontspec> elements when pdftohtml is run with -xml. The PDFs I'm trying to analyze appear to be pretty consistent with their font usage, to the point where I can use them to infer the text's meaning. But I needed more information in the
<fontspec> to do that, and this patch does that for me.
Patch 140750, "Patch to add functionality":