Skip to content

WIP: Set secondary codes frequencies to 0

Mathieu Bridon requested to merge secondary-codes-frequency into master

This is my attempt at fixing #104.

It is based on @yookoala's suggestion, as attempted in #105.

However, #105 was implementing this in the dbbuilder tool, whereas this tries to modify the source data instead, to have correct frequencies in the data and a "dump" database builder tool.

It gave me the database I asked @dollars0427 to test over at https://github.com/Cangjians/ibus-cangjie/issues/77#issuecomment-386891651, which is incorrect as evidenced by their feedback.

However, I still believe fixing the issue in the source data is the right approach.

The change was implemented with https://gitlab.freedesktop.org/cangjie/data-migration-tools/blob/d3b9a18e878cfab35af0fba39f8d23c77cffd5d5/fix-secondary-codes-frequency.py

I thought a script would be easier to review/audit than a huge data change, but then the script ended up being quite convoluted, due to how our data is structured, and trying to deal with all corner-cases (x-disambiguation, etc…)

I guess it's still good to have the script handy.

Merge request reports