FontConfig use the wrong encoding to decode NameRecord
Currently, FontConfig doesn't use the right encoding to decode the NameRecord for the microsoft platform: https://gitlab.freedesktop.org/fontconfig/fontconfig/-/blob/d863f6778915f7dd224c98c814247ec292904e30/src/fcfreetype.c#L88-103
Here is how GDI and DirectWrite does: https://github.com/MicrosoftDocs/typography-issues/issues/956#issuecomment-1205678068
In brief, if you want to emulate GDI, here is the logic in python:
if platformID == TT_PLATFORM_MICROSOFT:
if platEncID == TT_MS_ID_PRC:
return "cp936" #
elif platEncID == TT_MS_ID_BIG_5:
if nameID == TT_NAME_ID_FONT_SUBFAMILY:
return "utf_16_be"
else:
return "cp950"
elif platEncID == TT_MS_ID_WANSUNG:
if nameID == TT_NAME_ID_FONT_SUBFAMILY:
return "utf_16_be"
else:
return "cp949"
else:
return "utf_16_be"
Important to note, in GDI and DirectWrite, when the encoding is not "utf_16_be", it removes the leading zeros for each Double Byte.
Example in python:
# This bytes is from the font 文鼎中特廣告體 - Download here: http://fonts.top/Arphic-Fonts/41459.html
string = b"\x00\xa4\x00\xe5\x00\xb9\x00\xa9\x00\xa4\x00\xa4\x00\xaf\x00S\x00\xbc\x00s\x00\xa7\x00i\x00\xc5\x00\xe9"
if platformID == TT_PLATFORM_MICROSOFT and encoding != "utf_16_be":
name_to_decode = string.replace(b"\x00", b"")
Just for information, the logic is a bit different for the CMAP:
if platformID == TT_PLATFORM_MICROSOFT:
if platEncID == TT_MS_ID_SYMBOL_CS || platEncID == TT_MS_ID_UNICODE_CS || platEncID == TT_MS_ID_UCS_4:
return "utf_16_be"
elif platEncID == TT_MS_ID_SJIS:
return "cp932"
elif platEncID == TT_MS_ID_PRC:
return "cp936"
elif platEncID == TT_MS_ID_BIG_5:
return "cp950"
elif platEncID == TT_MS_ID_WANSUNG:
return "cp949"
elif platEncID == TT_MS_ID_JOHAB:
return "cp1361"