Skip to content

WIP: data: Fix the ordering of Cangjie 3 codes

Mathieu Bridon requested to merge bochecha/fix-cj3-ordering into master

Some characters have multiple Cangjie 3 codes. For any such character, the codes we have are ordered alphabetically. This comes from the original data we got when we started working on this with Wan Leung; the whole data was indexed by code, alphabetically:

https://github.com/wanleung/libcangjie/blob/master/tables/cj3-cjk.txt

However, we are about to split multiple codes for any given character so that only the first one has the non-zero frequency, and all additional codes have a frequency of 0. (see #104)

A prerequisite to that is that the multiple codes are actually ordered correctly.

This commit fixes the ordering of Cangjie 3 codes for many Chinese characters with more than one of them.

The changes to the data in this commit were made manually, painstakingly comparing our results with the ones from Windows, which we take as the reference implementation for Cangjie 3.

For example, on Windows 沉 only has code ebhu. This means that in Cangjie 3 we should have ebhu as the primary code for that character. We keep ebhn but it should come second, so that it doesn't interfere with the expected ordering when we actually implement #104.

All other changes in this commit went through the same process of comparison.

Merge request reports