fontconfig should change to BCP 47 language tags

Submitted by Roozbeh Pournader

Assigned to fon..@..op.org

Link to original bug (#19869)

Description

Presently, the language model for fontconfig is based on the RFC 3066 model of language tags, which is not future-proof. The language model should switch to that of current BCP 47 (RFC 4646). Or an extended version that adds ISO 639-3 codes (RFC 4646bis): the ISO 639-3 codes are already in use in various localization communities.

The main difference would be supporting script subtags. An HTML author may have Kurdish in the Arabic script, and it wouldn't care/know if it's for Iran or Iraq. So it can tag that part of the page as ku-Arab. The browser can later request fonts for ku-Arab from fontconfig, instead of automagically finding it is probably ku-IR to fontconfig.

For existing glibc locales, and applications that use that data to render fonts (pango?), "@" would translate to "-x-", converting locales like sr_RS@latin to valid BCP 47 tags like "sr-RS-x-latin" which we can support as an alias/copy of sr-Latn. This way, we would be handling glibc tags as what they really are: private use subtags. Alternatively, we can ask the applications to convert sr_RS@latin to sr-Latn-RS themselves somehow, which we will match with our sr-Latn.

This would also make maintaining orth files somehow easier and cleaner, as the languages written in different scripts won't be separated by countries, but by scripts. We will have a ku-Arab, with ku-IR and ku-IQ aliasing/including it, a ku-Latn, with ku-TR aliasing it, and a ku-Cyrl, with no alias until we find more information on which Kurdish .

Over time, country code language codes are supposed to disappear, so if somebody really wants ku-IR, he SHOULD say "ku-Arab-IR", which makes matching and everything much easier too.

Version: 2.6

Blocking

Bug 17208

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information