pdftohtml: Cannot parse text from OCRed document (tesseract 4.0.0)
Using poppler-utils 0.75.0, my OS is Ubuntu 18.04
Images that were converted by tesseract to searchable pdfs cannot be transformed to html, only the images are rendered and the text is ignored. Have attached example which was produced using http://www.orimi.com/pdf-test.pdf and the following order of commands:
convert pdf-test test.jpg
tesseract test.jpg test pdf
pdftohtml test.pdf