pdftotext inserts newline when there is none
snapshot from archive: https://web.archive.org/web/20200408054553if_/https://www.ne.ch/autorites/DFS/SCSP/medecin-cantonal/maladies-vaccinations/Documents/Covid-19-Statistiques/COVID19_PublicationInternet.pdf
COVID19_PublicationInternet.pdf
I am using pdftotext -layout
for this.
This happens both with version 0.71 (Debian testing) and 0.85 (Debian experimental).
Example of problematic conversions:
Start of the document:
Output:
Servicedel
asantépubli
que
Donnéesbaséessurlesdéc
l ar
ati
onsdelabo
Neuc
hât
el-CasCOVI
D-19posi
tif
s
Tableauact
uali
End of the document (table):
Output:
8avri
l2020 5 518 53 3 7 63 3 7 4 14 1 37
9avri
l2020 18 536 48 3 7 58 3 7 4 14
10avri
l2020 52 3 8 63 3 8 4 15
Notice the new line after avri
.