Skip to content

data: Fix the ordering of Cangjie 5 codes

Mathieu Bridon requested to merge bochecha/fix-cj5-ordering into master

Some characters have multiple Cangjie 5 codes. For any such character, the codes we have are ordered alphabetically. This comes from the original data we got when we started working on this with Wan Leung; the whole data was indexed by code, alphabetically:

https://github.com/wanleung/libcangjie/blob/master/tables/cj5-cjk.txt

However, we are about to split multiple codes for any given character so that only the first one has the non-zero frequency, and all additional codes have a frequency of 0. (see #104)

A prerequisite to that is that the multiple codes are actually ordered correctly.

This commit fixes the ordering of Cangjie 5 codes for any Chinese character with more than one of them.

The changes to the data in this commit were made with a script, and should be entirely reproducible. From a libcangjie clone on the master branch (commit d06ecb33), get the script and run it:

$ git clone https://gitlab.freedesktop.org/cangjie/data-migration-tools.git
$ ./data-migration-tools/fix-cj5-code-ordering.py

The modifications done by the script to the table should be identical to the ones in this commit.

Fixes #111 (closed)

Merge request reports