Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Our infrastructure migration is complete. Please remember to update your SSH remote to point to ssh.gitlab.freedesktop.org; SSH to the old hostname will time out. You should not see any problems apart from that. Please let us know if you do have any other issues.
When sorting with Pinyin, names with Chinese characters need to be mixed with Western names using the Pinyin transliteration of the Chinese characters.
Supporting this might be as easy as selecting the Pinyin collation inside the ch_CH.UTF-* locale. This might be the default already. If not, we need either an explicit additional setting (env variable?!) to select the collation (may be useful anyway) and/or hard-code default collations for certain locales.
Here are four names, one per line:
Adams
Jeffries
江
Meadows
江 has Jiang has Pinyin representation, so a collation based on Pinyin should sort as shown above (江 = Jiang after Jeffries and before Meadows). At least that's my understanding.
Here are four names, one per line:
Adams
Jeffries
江
Meadows
江 has Jiang has Pinyin representation, so a collation based on Pinyin should
sort as shown above (江 = Jiang after Jeffries and before Meadows). At least
that's my understanding.
A Chinese colleague confirmed that this is indeed what he expects.
People have different expectations for pinyin. Some possibilities are:
Sort Chinese characters in pinyin order, but separate from Latin
Sort them interleaved with Latin, by the first character.
Sort them fully interleaved with Latin.
For #2, the easiest way to do it is with the Alphabetic index. For #3, the best is to use a Han-Latin transliterator to get a key, then sort by that key.
We now know that ICU implements option 1, so implementing the expected outcome will be more work. We also need to determine whether #2 or #3 are expected.
A Chinese colleague confirmed that this is indeed what he expects.
[snip]
It would be nice if we could base this on some standard that's written down somewhere, or more thoroughly documented as being de-facto common.
We now know that ICU implements option 1, so implementing the expected
outcome will be more work. We also need to determine whether #2 or #3 are
expected.
It seems a little odd that ICU doesn't do something is apparently so common.
A Chinese colleague confirmed that this is indeed what he expects.
[snip]
It would be nice if we could base this on some standard that's written down
somewhere, or more thoroughly documented as being de-facto common.
I suspect that there is no such document.
We now know that ICU implements option 1, so implementing the expected
outcome will be more work. We also need to determine whether #2 or #3 are
expected.
It seems a little odd that ICU doesn't do something is apparently so common.
My understanding is that all three options are valid, so ICU simply picked one. Perhaps they didn't pick the most popular one.
In addition, fully interleaved Pinyin-based sorting is used for "zh". This requires an extra transliteration of Han->Latin, because ICU itself sorts Chinese characters after Latin ones when using the "Pinyin" collation.
EDS implements the same logic in the new ECollator utility class, scheduled for EDS 3.10 and included in the openismus-work-3-8 branch. SyncEvolution's PIM Manager should use these classes.
EDS implements the same logic in the new ECollator utility class, scheduled
for EDS 3.10 and included in the openismus-work-3-8 branch. SyncEvolution's
PIM Manager should use these classes.
The current EDS APIs lead to a slight performance degradation: ICU uses std::string, EDS copys into string, SyncEvolution recreates a std::string. A C++ API in EDS using std::string would be more useful.
For performance reasons I kept the code which uses ICU directly.