pdftohtml should include charset encoding in head section of *s.html files

Submitted by Samuel Thibault `@sthibaul`

Assigned to poppler-bugs

Description

Created attachment 85950 test file

Hello,

After having converted a pdf file to html, all the UTF-8 characters such as ● get bogus in the web browser, because the html file does not advertise the character set encoding of the file. pdftohtml should add this inside its <head>:

Pino Toscano added on http://bugs.debian.org/722281 that “This is added already in some occasions, but apparently not in frames when doing the "complex HTML output".”

For instance, after converting http://brl.thefreecat.org/ghm13.pdf (also attached here), ghm13s.html does not contain any encoding.

Samuel

Attachment 85950, "test file":
ghm13.pdf

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

pdftohtml should include charset encoding in head section of *s.html files

Submitted by Samuel Thibault @sthibaul

Description

Submitted by Samuel Thibault `@sthibaul`