UTF-16BE/UTF-16LE without BOM not supported (?)
I took some random UTF-8 encoded paragraph of Chinese text from https://zh.wikipedia.org and converted it to
iconv (GNU libiconv 1.11 on macOS 10.14). The
UTF-16LE version have no BOM, and in particular, the 2-byte BOM is the only difference between the
UTF-16 and the
UTF-16BE version. Rather surprisingly,
uchardet failed on both the
$ ./src/tools/uchardet zh.utf-8.txt zh.utf-16.txt zh.utf-16be.txt zh.utf-16le.txt zh.utf-8.txt: UTF-8 zh.utf-16.txt: UTF-16 zh.utf-16be.txt: unknown zh.utf-16le.txt: WINDOWS-1252
I have attached the text files.
Is there any chance this could be improved?