libcangjie issues

libcangjie issues https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues 2024-07-26T08:55:35Z https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/123 rpminspect report forbidden Unicode character found in data/table.txt 2024-07-26T08:55:35Z Koala Yeung

rpminspect report forbidden Unicode character found in data/table.txt

rpminspect report (such as [this](https://artifacts.dev.testing-farm.io/ac84345b-5cad-43b4-bede-91aa3137fdf5/)) complains to have found forbidden Unicode character in `data/table.txt` in the rpm file. The characters that rpminspect compl... rpminspect report (such as [this](https://artifacts.dev.testing-farm.io/ac84345b-5cad-43b4-bede-91aa3137fdf5/)) complains to have found forbidden Unicode character in `data/table.txt` in the rpm file. The characters that rpminspect complains about are: - 0x202A - 0x202B - 0x202C - 0x202D - 0x202E - 0x2066 - 0x2067 - 0x2068 - 0x2069 None of this characters are supposed to be in `data/table.txt`. They are either artifacts generated in build process or previous maintenance operations. Or this is a false positive of the rpminspect tool. Filed as issue [1418](https://github.com/rpminspect/rpminspect/issues/1418) in [github.com/rpminspect/rpminspect](https://github.com/rpminspect/rpminspect) to follow up. https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/121 Redo the websites in Gitlab pages 2024-05-17T22:04:34Z Mathieu Bridon

Redo the websites in Gitlab pages

This is the continuation of the move away from Github to Gitlab. We need to redo [the website](http://cangjians.github.io/) with Gitlab pages: * [ ] pages * [ ] C documentation for libcangjie * [ ] Python documentation for Pycangjie * ... This is the continuation of the move away from Github to Gitlab. We need to redo [the website](http://cangjians.github.io/) with Gitlab pages: * [ ] pages * [ ] C documentation for libcangjie * [ ] Python documentation for Pycangjie * [ ] user documentation for ibus-cangjie https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/109 Some characters have surprising x-disambiguation codes 2024-06-18T07:19:23Z Mathieu Bridon

Some characters have surprising x-disambiguation codes

I've been staring a lot at our data lately, [doing some cleanups](https://github.com/Cangjians/libcangjie/pull/108) and thinking about #55, #91 and #104. If I understood everything right, the x-disambiguation works as follows: * ch... I've been staring a lot at our data lately, [doing some cleanups](https://github.com/Cangjians/libcangjie/pull/108) and thinking about #55, #91 and #104. If I understood everything right, the x-disambiguation works as follows: * characters A and B both have code `abc` * since A is more frequent, B is given an additional x (`abcx` Cangjie 3, `xabc` in Cangjie 5) * as a result, B will have codes `abc` and `abcx`. (or `xabc` in Cangjie 5) The above stands true for pretty much all of our data, except for 8 characters, which have an x-disambiguated code without the corresponding non-x code: * 亟 has CJ3 codes `mem` and `nemx` * 妒 has CJ3 codes `vhs` and `visx` * 扁 has CJ3 codes `hsbt` and `isbtx` * 毋 has CJ3 codes `wj` and `wkx` * 袍 has CJ3 codes `fprux` and `lpru` * 鼎 has CJ3 codes `buux` and `buvml` * 鼐 has CJ3 codes `nhbux` and `nsbul` * 覇 has CJ5 codes `mwtjb` and `xmbtj` I'm not sure what to do with these. Is something wrong with them? Or are they just fine and my assumptions were unfounded? @yookoala do you have any idea? Koala Yeung Koala Yeung https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/107 Something is weird with the 0 2024-06-18T07:14:05Z Mathieu Bridon

Something is weird with the 0

In Chinese, the number 0 can have two forms: * 零 - U+96F6, CJK UNIFIED IDEOGRAPH-96F6 * 〇 - U+3007, IDEOGRAPHIC NUMBER ZERO The former has code `mboii` in both CJ3 and CJ5. The latter has code `xxxxx`… but only in CJ5, it has no c... In Chinese, the number 0 can have two forms: * 零 - U+96F6, CJK UNIFIED IDEOGRAPH-96F6 * 〇 - U+3007, IDEOGRAPHIC NUMBER ZERO The former has code `mboii` in both CJ3 and CJ5. The latter has code `xxxxx`… but only in CJ5, it has no code in CJ3. This means there is no way to type 〇 in CJ3. However, in CJ3 there is another character with code `xxxxx`: ○ (U+25CB, WHITE CIRCLE). This character looks very similar to 〇, which makes me think it was incorrectly used instead of 〇 by whoever assembled the CJ3 code. Should we remove `xxxxx` from ○ in CJ3 and instead give it to 〇? This would be: ```diff diff --git a/data/table.txt b/data/table.txt index ebf84c8..2895d0d 100644 --- a/data/table.txt +++ b/data/table.txt @@ -35,3 +35,3 @@ ◇ NA 0 0 0 0 0 0 0 0 1 yyybe yyybe,za NA 0 -○ NA 0 0 0 0 0 0 0 0 1 xxr,xxxxx,yyybk yyybk,za NA 0 +○ NA 0 0 0 0 0 0 0 0 1 xxr,yyybk yyybk,za NA 0 ◎ NA 0 0 0 0 0 0 0 0 1 yyybn yyybn,za NA 0 @@ -44,3 +44,3 @@ 。 NA 0 0 0 0 0 0 0 1 0 zxad zxad . 2 -〇 NA 0 0 0 0 0 0 0 0 1 NA xxxxx NA 0 +〇 NA 0 0 0 0 0 0 0 0 1 xxxxx xxxxx NA 0 〈 NA 0 0 0 0 0 0 0 1 0 yyyae,zxby yyyae,za,zxby ' 6 ``` --- In addition, since 零 and 〇 are alternative forms of each other, should we add the `mboii` code to 〇? This would be: ```diff diff --git a/data/table.txt b/data/table.txt index ebf84c8..8ee4e4c 100644 --- a/data/table.txt +++ b/data/table.txt @@ -44,3 +44,3 @@ 。 NA 0 0 0 0 0 0 0 1 0 zxad zxad . 2 -〇 NA 0 0 0 0 0 0 0 0 1 NA xxxxx NA 0 +〇 NA 1 1 0 0 1 0 0 0 0 mboii mboii,xxxxx NA 0 〈 NA 0 0 0 0 0 0 0 1 0 yyyae,zxby yyyae,za,zxby ' 6 ``` Koala Yeung Koala Yeung https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/91 Revamp our data format (source and DB) 2024-07-18T14:55:35Z Mathieu Bridon

Revamp our data format (source and DB)

@yookoala and I have been talking about this for some time now: we're not very happy with our data format: - the source format is hard to read for humans (and the sources are primarily **for** humans) - the db is a bit clunky (need to jo... @yookoala and I have been talking about this for some time now: we're not very happy with our data format: - the source format is hard to read for humans (and the sources are primarily **for** humans) - the db is a bit clunky (need to join two tables, frequency of a **character** in the `codes` table,...) - etc... I've been trying to work with [gom](http://www.hadess.net/2014/04/what-is-gom.html), a GObject ORM, and our data format definitely makes it harder than it should be. (not impossible, of course, just annoying) So let's fix this! 2.0 Mathieu Bridon Mathieu Bridon https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/65 Suggested change / option: Z instead of * as wildcard for Jian-Yi / Simplified Cangjie / 簡易 2019-03-09T01:49:52Z Mathieu Bridon

Suggested change / option: Z instead of * as wildcard for Jian-Yi / Simplified Cangjie / 簡易

*Created by: boyin* Reason: 1. I think that **z** is easier to access than asterisk *****. I understand that on french keyboards, the natural result of hitting the "8 \* " key is the asterisk \* and not "8", but on most other keyboard... *Created by: boyin* Reason: 1. I think that **z** is easier to access than asterisk *****. I understand that on french keyboards, the natural result of hitting the "8 \* " key is the asterisk \* and not "8", but on most other keyboards \* requires a shift. 2. z is currently unused as a **non-leading** character in an encoding anyway. There are encodings with z as the leading character for special symbols, but the wildcard _*_ does not operate in the leading position. 3. It seems incongruous for the asterisk \* to have different behavior from all other punctuation marks, https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/64 Order the returned list of `CangjieChar` 2019-03-09T01:49:52Z Mathieu Bridon

Order the returned list of `CangjieChar`

Currently, the `cangjie_get_characters()` function returns a completely unordered list of `CangjieChar`, and it is up to the application using libcangjie to order them all afterwards. This means that we iterate over the list of characte... Currently, the `cangjie_get_characters()` function returns a completely unordered list of `CangjieChar`, and it is up to the application using libcangjie to order them all afterwards. This means that we iterate over the list of characters twice: - once when creating the list of `CangjieChar` (we iterate over the results of the SQL query) - once when ordering them (in the application using libcangjie) If `cangjie_get_characters()` ordered the results itself, it could do so with an `ORDER BY` statement directly in the SQL query, and so we'd remove the need for the application to do it themselves. The new API should let the application specify on which column(s) to order, ascending or descending. 2.0 enhancement https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/55 Order of suggested characters (x disambiguation) 2019-03-09T01:49:52Z Mathieu Bridon

Order of suggested characters (x disambiguation)

*Created by: boyin* Both Cangjie3 and Cangjie5 have officially sanctioned "canonical characters" and "duplicate characters". For each encoding that covers multiple characters, one character is selected to be the "canonical" character a... *Created by: boyin* Both Cangjie3 and Cangjie5 have officially sanctioned "canonical characters" and "duplicate characters". For each encoding that covers multiple characters, one character is selected to be the "canonical" character and is the default character selected by that sequence. The other character(s) is selected by letter(s) X in prefix or suffix. (Currently ibus-cangjie uses suffix X; ibus-table-chinese-cangjie, like MacOSX, uses prefix). The following pairs of characters seems to be listed the wrong way around in the default setup of ibus-cangjie (and ibus-table-chinese-cangjie) because they were arranged in what used to be known as "big-5 code order". ABJJ\* 暈 XABJJ 暉 AFMBC\* 顯 XAFMB\* 顥 ANAU\* 晚 XANAU 冕 AYK 旻 XAYK 旼 BHN\* 肌 XBHN\* 冗 BT\* 皿 XBT\* 冊 BUOG\* 瞿 XBUOG 睢 DWD\* 棵 XDWD\* 梱 DYTJ\* 樟 XDYTJ 梓 EA 汨 XEA 沓 XXEA 汩 HMNL\* 郵 XHMNL 邸 MRNO\* 歌 XMRNO 砍 NL\* 引 XNL\* 弔 NO\* 欠 XNO\* 久 OFHAF 鷦 XOFHA 鷡 OGE\* 雙 XOGE\* 隻 ORMBC\* 頷 XORMB\* 頜 QYBB 揥 XQYBB 撾 RMMR 跖 XRMMR 唔 RSHAF 鶚 XRSHA 鴞 SHOE\* 履 XSHOE 屐 SRNL\* 郡 XSRNL 邵 TMD\* 某 XTMD\* 芋 TKN 荑 XTKN 艽 TMNL 邯 XTMNL 鄞 TW\* 苗 XTW\* 曲 TWK\* 奠 XTWK\* 茵 TSP 懃 XTSP 苨 VFHAF\* 鸞 XVFHA\* 鷥 VFJMC\* 繽 XVFJM\* 縯 VFQ 攣 XVFQ 姅 WD\* 果 XWD\* 困 YPD 柴 XYPD 迆 YRPA\* 詢 XYRPA 詣 YRU 訕 XYRU 乩 YTHAF 鸕 XYTHA 鴗 bug