pdftohtml should include charset encoding in head section of *s.html files
@sthibaul
Submitted by Samuel Thibault Assigned to poppler-bugs
Description
Created attachment 85950 test file
Hello,
After having converted a pdf file to html, all the UTF-8 characters such
as ● get bogus in the web browser, because the html file does not
advertise the character set encoding of the file. pdftohtml should add
this inside its <head>
:
Pino Toscano added on http://bugs.debian.org/722281 that “This is added already in some occasions, but apparently not in frames when doing the "complex HTML output".”
For instance, after converting http://brl.thefreecat.org/ghm13.pdf (also attached here), ghm13s.html does not contain any encoding.
Samuel
Attachment 85950, "test file":
ghm13.pdf