When extracting as XML all new lines are stripped
Submitted by cla..@..eat.dk
Assigned to poppler-bugs
Link to original bug (#104230)
Description
Created attachment 136123 test pdf
pdftohtml -s -i -xml test.pdf out.xml
VS
pdftohtml -s -i test.pdf out.html
When you extract the text as HTML alle new lines are kept, but if you extract the text as XML they are stripped out and each new line is put in a new tag
Attachment 136123, "test pdf":
001.pdf