libcangjie issueshttps://gitlab.freedesktop.org/cangjie/libcangjie/-/issues2019-03-09T01:49:52Zhttps://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/109Some characters have surprising x-disambiguation codes2019-03-09T01:49:52ZMathieu BridonSome characters have surprising x-disambiguation codesI've been staring a lot at our data lately, [doing some cleanups](https://github.com/Cangjians/libcangjie/pull/108) and thinking about #55, #91 and #104.
If I understood everything right, the x-disambiguation works as follows:
* ch...I've been staring a lot at our data lately, [doing some cleanups](https://github.com/Cangjians/libcangjie/pull/108) and thinking about #55, #91 and #104.
If I understood everything right, the x-disambiguation works as follows:
* characters A and B both have code `abc`
* since A is more frequent, B is given an additional x (`abcx` Cangjie 3, `xabc` in Cangjie 5)
* as a result, B will have codes `abc` and `abcx`. (or `xabc` in Cangjie 5)
The above stands true for pretty much all of our data, except for 8 characters, which have an x-disambiguated code without the corresponding non-x code:
* 亟 has CJ3 codes `mem` and `nemx`
* 妒 has CJ3 codes `vhs` and `visx`
* 扁 has CJ3 codes `hsbt` and `isbtx`
* 毋 has CJ3 codes `wj` and `wkx`
* 袍 has CJ3 codes `fprux` and `lpru`
* 鼎 has CJ3 codes `buux` and `buvml`
* 鼐 has CJ3 codes `nhbux` and `nsbul`
* 覇 has CJ5 codes `mwtjb` and `xmbtj`
I'm not sure what to do with these. Is something wrong with them? Or are they just fine and my assumptions were unfounded?
@yookoala do you have any idea?Koala YeungKoala Yeunghttps://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/107Something is weird with the 02019-03-09T01:49:52ZMathieu BridonSomething is weird with the 0In Chinese, the number 0 can have two forms:
* 零 - U+96F6, CJK UNIFIED IDEOGRAPH-96F6
* 〇 - U+3007, IDEOGRAPHIC NUMBER ZERO
The former has code `mboii` in both CJ3 and CJ5. The latter has code `xxxxx`… but only in CJ5, it has no c...In Chinese, the number 0 can have two forms:
* 零 - U+96F6, CJK UNIFIED IDEOGRAPH-96F6
* 〇 - U+3007, IDEOGRAPHIC NUMBER ZERO
The former has code `mboii` in both CJ3 and CJ5. The latter has code `xxxxx`… but only in CJ5, it has no code in CJ3.
This means there is no way to type 〇 in CJ3.
However, in CJ3 there is another character with code `xxxxx`: ○ (U+25CB, WHITE CIRCLE). This character looks very similar to 〇, which makes me think it was incorrectly used instead of 〇 by whoever assembled the CJ3 code.
Should we remove `xxxxx` from ○ in CJ3 and instead give it to 〇?
This would be:
```diff
diff --git a/data/table.txt b/data/table.txt
index ebf84c8..2895d0d 100644
--- a/data/table.txt
+++ b/data/table.txt
@@ -35,3 +35,3 @@
◇ NA 0 0 0 0 0 0 0 0 1 yyybe yyybe,za NA 0
-○ NA 0 0 0 0 0 0 0 0 1 xxr,xxxxx,yyybk yyybk,za NA 0
+○ NA 0 0 0 0 0 0 0 0 1 xxr,yyybk yyybk,za NA 0
◎ NA 0 0 0 0 0 0 0 0 1 yyybn yyybn,za NA 0
@@ -44,3 +44,3 @@
。 NA 0 0 0 0 0 0 0 1 0 zxad zxad . 2
-〇 NA 0 0 0 0 0 0 0 0 1 NA xxxxx NA 0
+〇 NA 0 0 0 0 0 0 0 0 1 xxxxx xxxxx NA 0
〈 NA 0 0 0 0 0 0 0 1 0 yyyae,zxby yyyae,za,zxby ' 6
```
---
In addition, since 零 and 〇 are alternative forms of each other, should we add the `mboii` code to 〇?
This would be:
```diff
diff --git a/data/table.txt b/data/table.txt
index ebf84c8..8ee4e4c 100644
--- a/data/table.txt
+++ b/data/table.txt
@@ -44,3 +44,3 @@
。 NA 0 0 0 0 0 0 0 1 0 zxad zxad . 2
-〇 NA 0 0 0 0 0 0 0 0 1 NA xxxxx NA 0
+〇 NA 1 1 0 0 1 0 0 0 0 mboii mboii,xxxxx NA 0
〈 NA 0 0 0 0 0 0 0 1 0 yyyae,zxby yyyae,za,zxby ' 6
```Koala YeungKoala Yeunghttps://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/104Use 0 for all secondary mapping of the table2019-07-07T22:20:26ZKoala YeungUse 0 for all secondary mapping of the tableNeed to update dbbuilder to implement mechanism suggested by https://github.com/Cangjians/ibus-cangjie/issues/77#issuecomment-269938845 (solution B). The aim is to remove the disturbance of all non-standard mapping to the order in Quick ...Need to update dbbuilder to implement mechanism suggested by https://github.com/Cangjians/ibus-cangjie/issues/77#issuecomment-269938845 (solution B). The aim is to remove the disturbance of all non-standard mapping to the order in Quick input method and resolve https://github.com/Cangjians/ibus-cangjie/issues/77.Koala YeungKoala Yeunghttps://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/91Revamp our data format (source and DB)2019-03-09T01:49:52ZMathieu BridonRevamp our data format (source and DB)@yookoala and I have been talking about this for some time now: we're not very happy with our data format:
- the source format is hard to read for humans (and the sources are primarily **for** humans)
- the db is a bit clunky (need to jo...@yookoala and I have been talking about this for some time now: we're not very happy with our data format:
- the source format is hard to read for humans (and the sources are primarily **for** humans)
- the db is a bit clunky (need to join two tables, frequency of a **character** in the `codes` table,...)
- etc...
I've been trying to work with [gom](http://www.hadess.net/2014/04/what-is-gom.html), a GObject ORM, and our data format definitely makes it harder than it should be. (not impossible, of course, just annoying)
So let's fix this!
2.0Mathieu BridonMathieu Bridonhttps://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/90Confusion around half-/full-width and short codes for punctuations2019-03-09T01:49:52ZMathieu BridonConfusion around half-/full-width and short codes for punctuationsWe talked on IRC with @iravan the other day.
He has a problem with IBus Cangjie: he wants to input the halfwidth space.
That's possible at the moment with IBus Cangjie: just enable the "_Halfwidth Characters_" option, and you get all o...We talked on IRC with @iravan the other day.
He has a problem with IBus Cangjie: he wants to input the halfwidth space.
That's possible at the moment with IBus Cangjie: just enable the "_Halfwidth Characters_" option, and you get all of those characters (like space) in their halfwidth version.
However, you also lose the short code thing, because the code is wired up to only try the short code thing when inputting fullwidth...
That's not good.
One way to fix this is to completely separate the half-/full-width mapping from the short-code thing in libcangjie and its data. I think it makes sense, because they are fundamentally different concepts.
In the database, we can continue to use version 0 for the short-code, and keep the punctuation short code mappings (like `.` to `。`).
In addition, we could introduce a new mapping of halfwidth to fullwidth characters.
IBus Cangjie (and other libcangjie users) would need 2 queries to retrieve these 2 informations, but they'd really get what they expect, not a mix of two different concepts.
Here's the user experience we could have in IBus Cangjie with this separation, in a few examples.
#### User presses ` ` (space)
- With the "_Halfwidth Characters_" option OFF
1. We query the list of characters for which the short code is ` ` (there would be none)
2. We get the fullwidth version of ` `: it is` `
3. We found only one character, so we commit ` ` (fullwidth space)
- With the "_Halfwidth Characters_" option ON
1. We query the list of characters for which the short code is ` ` (there would be none)
2. We keep the character itself (as it is halfwidth)
3. We found only one character, so we commit ` ` (halfwidth space)
#### User presses `.` (dot, full stop)
- With the "_Halfwidth Characters_" option OFF
1. We query the list of characters for which the short code is `.` (there is one: `。`)
2. We get the fullwidth version of `.`: it is `.`
3. We found 2 characters, so we show the list of candidates: `。`, `.`
(the last one is the fullwidth full stop)
- With the "_Halfwidth Characters_" option ON
1. We query the list of characters for which the short code is `.` (there is one: `。`)
2. We keep the character itself (as it is halfwidth)
3. We found 2 characters, so we show the list of candidates: `。`, `.`
(the last one is the halfwidth full stop)
#### User presses `"` (double quote)
- With the "_Halfwidth Characters_" option OFF
1. We query the list of characters for which the short code is `"` (they are: `《`, `》`, `『` and `』`)
2. We get the fullwidth version of `"`: it is `"`
3. We found 5 characters, so we show the list of candidates: `《`, `》`, `『`, `』`, `"`
(the last one is the fullwidth quote)
- With the "_Halfwidth Characters_" option ON
1. We query the list of characters for which the short code is `"` (they are: `《`, `》`, `『` and `』`)
2. We keep the character itself (as it is halfwidth)
3. We found 5 characters, so we show the list of candidates: `《`, `》`, `『`, `』`, `"`
(the last one is the halfwidth quote)
---
That would fix @iravan's issue, which I expect is a common complaint. It would also make the data/API clearer (by not confusing two different concepts).
How does that sound?
2.0https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/89Improve the order of the shortcode suggestions2019-03-09T01:49:52ZKoala YeungImprove the order of the shortcode suggestionsAt the moment, we order `、` before `,`, which is bad as the former is more commonly used than the latter.
There might be other such cases where we're ordering wrong as well.
@wanleung was suggesting to order by Cangjie code, so that `,...At the moment, we order `、` before `,`, which is bad as the former is more commonly used than the latter.
There might be other such cases where we're ordering wrong as well.
@wanleung was suggesting to order by Cangjie code, so that `,` would be before `、`, because `zxab` is before `zxac`.
https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/65Suggested change / option: Z instead of * as wildcard for Jian-Yi / Simplifie...2019-03-09T01:49:52ZMathieu BridonSuggested change / option: Z instead of * as wildcard for Jian-Yi / Simplified Cangjie / 簡易*Created by: boyin*
Reason:
1. I think that **z** is easier to access than asterisk *****. I understand that on french keyboards, the natural result of hitting the "8 \* " key is the asterisk \* and not "8", but on most other keyboard...*Created by: boyin*
Reason:
1. I think that **z** is easier to access than asterisk *****. I understand that on french keyboards, the natural result of hitting the "8 \* " key is the asterisk \* and not "8", but on most other keyboards \* requires a shift.
2. z is currently unused as a **non-leading** character in an encoding anyway. There are encodings with z as the leading character for special symbols, but the wildcard _*_ does not operate in the leading position.
3. It seems incongruous for the asterisk \* to have different behavior from all other punctuation marks,
https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/64Order the returned list of `CangjieChar`2019-03-09T01:49:52ZMathieu BridonOrder the returned list of `CangjieChar`Currently, the `cangjie_get_characters()` function returns a completely unordered list of `CangjieChar`, and it is up to the application using libcangjie to order them all afterwards.
This means that we iterate over the list of characte...Currently, the `cangjie_get_characters()` function returns a completely unordered list of `CangjieChar`, and it is up to the application using libcangjie to order them all afterwards.
This means that we iterate over the list of characters twice:
- once when creating the list of `CangjieChar` (we iterate over the results of the SQL query)
- once when ordering them (in the application using libcangjie)
If `cangjie_get_characters()` ordered the results itself, it could do so with an `ORDER BY` statement directly in the SQL query, and so we'd remove the need for the application to do it themselves.
The new API should let the application specify on which column(s) to order, ascending or descending.
2.0https://gitlab.freedesktop.org/cangjie/libcangjie/-/issues/55Order of suggested characters (x disambiguation)2019-03-09T01:49:52ZMathieu BridonOrder of suggested characters (x disambiguation)*Created by: boyin*
Both Cangjie3 and Cangjie5 have officially sanctioned "canonical characters" and "duplicate characters". For each encoding that covers multiple characters, one character is selected to be the "canonical" character a...*Created by: boyin*
Both Cangjie3 and Cangjie5 have officially sanctioned "canonical characters" and "duplicate characters". For each encoding that covers multiple characters, one character is selected to be the "canonical" character and is the default character selected by that sequence. The other character(s) is selected by letter(s) X in prefix or suffix. (Currently ibus-cangjie uses suffix X; ibus-table-chinese-cangjie, like MacOSX, uses prefix).
The following pairs of characters seems to be listed the wrong way around in the default setup of ibus-cangjie (and ibus-table-chinese-cangjie) because they were arranged in what used to be known as "big-5 code order".
ABJJ\* 暈 XABJJ 暉
AFMBC\* 顯 XAFMB\* 顥
ANAU\* 晚 XANAU 冕
AYK 旻 XAYK 旼
BHN\* 肌 XBHN\* 冗
BT\* 皿 XBT\* 冊
BUOG\* 瞿 XBUOG 睢
DWD\* 棵 XDWD\* 梱
DYTJ\* 樟 XDYTJ 梓
EA 汨 XEA 沓 XXEA 汩
HMNL\* 郵 XHMNL 邸
MRNO\* 歌 XMRNO 砍
NL\* 引 XNL\* 弔
NO\* 欠 XNO\* 久
OFHAF 鷦 XOFHA 鷡
OGE\* 雙 XOGE\* 隻
ORMBC\* 頷 XORMB\* 頜
QYBB 揥 XQYBB 撾
RMMR 跖 XRMMR 唔
RSHAF 鶚 XRSHA 鴞
SHOE\* 履 XSHOE 屐
SRNL\* 郡 XSRNL 邵
TMD\* 某 XTMD\* 芋
TKN 荑 XTKN 艽
TMNL 邯 XTMNL 鄞
TW\* 苗 XTW\* 曲
TWK\* 奠 XTWK\* 茵
TSP 懃 XTSP 苨
VFHAF\* 鸞 XVFHA\* 鷥
VFJMC\* 繽 XVFJM\* 縯
VFQ 攣 XVFQ 姅
WD\* 果 XWD\* 困
YPD 柴 XYPD 迆
YRPA\* 詢 XYRPA 詣
YRU 訕 XYRU 乩
YTHAF 鸕 XYTHA 鴗