font/encodings: Update GB18030 to 2005 version for non-BMP unicode support
Submitted by Mingye Wang (Arthur2e5)
Assigned to Xorg Project Team
Link to original bug (#101230)
Description
The Chinese GB 18030 standard defines a four-byte code to cover chunks of Unicode unmapped by its one-byte and two-byte codes (~GBK, Euro moved from \x80 [cp936] to \xA2\xE3). In the 2000 version of GB 18030, such expansion is limited to the BMP minus surrogates; in the 2005 version, the entire Unicode range (minus surrogates) up to U+10FFFF is covered. With a spec update, people can do emojis in telnet with a legacy-ish encoding!
The 2005 upgrade is largely backwards compatible with the 2000 spec, with a PUA swap between \xA8\xBC (U+1E3F ḿ) and \x81\x35\xF4\x37 (provisional PUA: U+E7C7) that addresses a Unicode addition. Since the higher-range areas are still largely unpopulated, most of the change would simply involve expanding the 2000.1 file, while renaming everything to 2005.
Keep in mind though, an upcoming GB 18030 update is likely to address a few of extra Unicode updates per https://github.com/whatwg/encoding/issues/27#issuecomment-287745429. The range part is unlikely to change still -- it can't get wider with Unicode for now.
Version: git