Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • L libcangjie
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 7
    • Merge requests 7
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • CangjieCangjie
  • libcangjie
  • Merge requests
  • !118

WIP: data: Fix the ordering of Cangjie 3 codes

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Mathieu Bridon requested to merge bochecha/fix-cj3-ordering into master Jul 07, 2019
  • Overview 0
  • Commits 2
  • Pipelines 1
  • Changes 1

Some characters have multiple Cangjie 3 codes. For any such character, the codes we have are ordered alphabetically. This comes from the original data we got when we started working on this with Wan Leung; the whole data was indexed by code, alphabetically:

https://github.com/wanleung/libcangjie/blob/master/tables/cj3-cjk.txt

However, we are about to split multiple codes for any given character so that only the first one has the non-zero frequency, and all additional codes have a frequency of 0. (see #104)

A prerequisite to that is that the multiple codes are actually ordered correctly.

This commit fixes the ordering of Cangjie 3 codes for many Chinese characters with more than one of them.

The changes to the data in this commit were made manually, painstakingly comparing our results with the ones from Windows, which we take as the reference implementation for Cangjie 3.

For example, on Windows 沉 only has code ebhu. This means that in Cangjie 3 we should have ebhu as the primary code for that character. We keep ebhn but it should come second, so that it doesn't interfere with the expected ordering when we actually implement #104.

All other changes in this commit went through the same process of comparison.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: bochecha/fix-cj3-ordering