"UTF-16" not native byte order on OS X iconv (re ustrings to_utf8)
Submitted by Franz Brauße
Assigned to poppler-bugs
ustring::to_utf8() creates a
MiniIconv ic("UTF-8", "UTF-16");
assuming that iconv(3) uses the native byte order for "UTF-16". On OS X w/ Intel CPUs (I installed poppler through MacPorts, but this issue is unrelated, see below) this fails, as a quick
$ echo -n 7 | iconv -t utf-16 | hexdump -C 00000000 fe ff 00 37 |...7|
reveals: it's UTF-16BE.
This breaks page-labels for me, which instead of "78" (UTF-8) return the (hex) values
e3 9c 80 e3 a0 80
which is 0x3700 0x3800.
A fix might be to not "decode" GooString's UTF-16BE to native byte order in
or use a source encoding based on the BYTE_ORDER macro instead of just "UTF-16BE" or to check the BOM-character output by iconv(3) (which e.g.
ustring::from_utf8(const char *str, int len)