Drawing strings with fontset in missing charsets
Submitted by wil..@..at.com
Assigned to Xorg Project Team
Description
I have a program using Xlib where I draw text using XmbDrawString(). I've found that certain characters I try to draw do not show as I expect.
Some information about my environment and setup:
- Locale: en_CA.UTF-8 and setlocale(LC_ALL, "") set in the program.
- I load a fontset with XCreateFontSet() with a base font name of only "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60"
- XCreateFontSet() reports 4 missing charsets: JISX0208.1983-0, KSC5601.1987-0, GB2312.1980-0, JISX0201.1976-0
- My X locale database XLC_FONTSET lists ISO8859-1:GL first and ISO10646-1 last, with KSC5601.1987-0 in between.
- I pass in UTF-8 text to XmbDrawString().
What I see is the majority of ASCII/non-ASCII characters render correctly. There are some that do not. One that does not work is U+2122, the trademark symbol, ™. It shows as '"b'.
I've traced through what is happening and this is my best understanding: The conversion code translates it to KSC5601.1987-0 encoding, which my fontset lacks, and then tries to display it with the ISO8859-1 font.
Here is some information at the code level:
In modules/om/generic/omText.c we convert the input (UTF-8 text) to a charset listed in the X locale database. There are several we try to convert to, in order. In src/xlibi18n/lcUTF8.c we load an ordered list of preferred encodings, matching that from the X locale database. The U+2122 characters gets converted to the KSC5601.1987-0 charset since it is apparently valid there and this charset comes before ISO10646-1 where it is also valid. KSC5601.1987-0 is a charset my fontset does not have. We end up trying to draw it using ISO8859-1 which appears to be the default due to being in position 0. This leads to the '"b'.
I've confirmed if I drop KSC5601.1987-0 from my X locale database, or skip over it during the conversion, that we convert the trademark symbol to ISO10646-1. Converting to ISO10646-1 is what I expected.
The problem is more extreme if we try to load a fontset with a font with a charset specified, such as with a base font name "-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1". If I do so then ASCII characters translate to ISO8859-1, but since there is no font in the fontset with that charset, they can't be drawn. But they could if we translated them to ISO10646-1.
For a solution I am thinking that during the conversions (in lcUTF8.c, such as in charset_wctocs()) we could favour trying those charsets that are available in the font set. That is, skip those that are missing, and at worst try them last. This would mean in both of the problem cases I describe, the characters would translate to ISO10646-1 and display.
From looking at the code I'm not sure the best way to make this happen though. It may be acceptable design wise as some of the lcUTF8.c code is already fontset aware.
I've already converted my program to use Xft for drawing text. I realize that is probably the recommended way to go these days. I wanted to try to figure out why the Xlib core font system was behaving like this though.
Please let me know if I can provide any more information or if you have any ideas about this.
Version: 7.7 (2012.06)