Skip to content

Incorrect implementation of prependUnicodeMarker()

Georgiy Sgibnev requested to merge georgiy/poppler:fix-bom into master

The main purpose of the BOM is to inform a reader about a byte order. So this isn't correct -

void GooString::prependUnicodeMarker()
{
    insert(0, "\xFE\xFF", 2);
}

Let me demonstrate the consequences:

std::string s8 = u8"test";
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
std::u16string s16 = converter.from_bytes(s8);

GooString gooStr = GooString((const char *) s16.c_str(), s16.length() * 2);
gooStr.prependUnicodeMarker(); // gooStr's content is FE FF 74 00 65 00 73 00 74 00.
QString qStr = UnicodeParsedString(&gooStr);
printf("qStr=%s\n", qStr.toUtf8().constData()); // Prints "qStr=琀攀猀琀".

My implementation:

void GooString::prependUnicodeMarker()
{
    static const uint16_t BOM = 0xFEFF;
    insert(0, (const char *)&BOM, sizeof(BOM));
}

Now gooStr's content is FF FE 74 00 65 00 73 00 74 00 and printf() prints "qStr=test".

Merge request reports