UTF-16BE/UTF-16LE without BOM not supported (?)
I took some random UTF-8 encoded paragraph of Chinese text from https://zh.wikipedia.org and converted it to UTF-16
, UTF-16BE
, UTF-16LE
with iconv
(GNU libiconv 1.11 on macOS 10.14). The UTF-16BE
and UTF-16LE
version have no BOM, and in particular, the 2-byte BOM is the only difference between the UTF-16
and the UTF-16BE
version. Rather surprisingly, uchardet
failed on both the UTF-16BE
and UTF-16LE
versions:
$ ./src/tools/uchardet zh.utf-8.txt zh.utf-16.txt zh.utf-16be.txt zh.utf-16le.txt
zh.utf-8.txt: UTF-8
zh.utf-16.txt: UTF-16
zh.utf-16be.txt: unknown
zh.utf-16le.txt: WINDOWS-1252
I have attached the text files.
Is there any chance this could be improved?