RFC: Making it possible to use text-input without keyboard events

added text-input label

When we talk about this, we should review what's already available on different platform, especially for cross platform toolkits and see how they expose it via their API. Without toolkit adoption, the protocol change would be simply useless and marked with TODO/unimplemented.

Force using key event or not is probably not a key point here.

As for android, there is https://developer.android.com/reference/android/widget/TextView#attr_android:imeOptions

It's actually a enum value, and can be mapped to corresponding value in Qt.

https://doc.qt.io/qt-5/qt.html#EnterKeyType-enum

As for special actions, it is done by key simulation. https://github.com/qt/qtbase/blob/dev/src/plugins/platforms/android/qandroidinputcontext.cpp#L1655

So unfortunately, even though we might want to ditch away from key event, the toolkit itself might still need to do key simulation to implement the feature in the toolkit.

But I'm not saying adding a custom action other than key is not preferred, but xf86 key sym already has tons of actions and is able to covers a large set of standard actions, like Undo and Redo is in key sym 0xff65 and 0xff66.

And there are still a large unused enum space can be used. So probably extending Keysym would be a better option instead of adding another burden on toolkit to expose a new API for solely actions.

But action key hint is probably a good idea to have. As we already have a CONTENT_HINT and CONTENT_PURPOSE, add another ACTION_KEY_HINT seems to be useful.

As for special actions, it is done by key simulation. https://github.com/qt/qtbase/blob/dev/src/plugins/platforms/android/qandroidinputcontext.cpp#L1655

At the moment this link highlights an empty line. Please use a permalink for the link. Press Y for getting one while viewing the file on GitHub.

One of the main reasons I proposed this is that keysym simulation between the input method and the application is controversial and it's not certain if it will be adopted. See discussion under virtual-keyboard.

Without virtual-keyboard we have no choice but to support a separate protocol (I'm ignoring whether the toolkit translates this into key events internally, not relevant to protocols). This automatically eliminates XFree86 codes, except for inspiration.

My question would be: which codes do we want to support? And what to do to have the protocol widely adopted?

Android's action button hints are cool, but I think much more utility is in supporting many different buttons, as opposed to different shapes for one button. Imagine text fields offering no undo/redo support and advertising the fact, or fields which are not submittable (streaming?).

Customize keyboard is one thing, in mobile world it is possible to directly attach native ui to the virtual board instead of using some "protocol" here. But we might want need to stick to a protocol for virtual keyboard customization.

Sending special event is another thing. Personally I don't understand why text input keysym event is removed and now we need to use another newly invented protocol to simulate that again. Pretty much all app cares more about keysym instead of keycode. Even if we need to keep same interface, I'd rather to make it totally valid for input method to send a (keysym=sth, keycode=0, state=sth) event. So we don't need to care about any "reverse" conversion.

Here's the reason why keysyms have been removed: keycodes require no special support on the receiving side. On the other hand, keysyms can't be sent via the wl_keyboard interface.

Having both keycodes and keysyms is unnecessary since you can easily get keysyms from a keycode.

This is not the case for virtual keyboard. For virtual keyboard, the "keymap" thing is meaningless. Why would virtual keyboard care about the physical keyboard keymap? Suppose the system is under qwerty and virtual keyboard is displaying a azerty. How do virtual keyboard send a key "A"? Under current case it has to learn that physical keymap, reverse from A to the keycode under qwerty and send a different code.

I don't think that is an ideal situation.

No, currently the virtual keyboard has a separate keymap which is sent to clients. There's no keycode conversion going on, keycodes flow directly from the input-method to the clients.

Customize keyboard is one thing, in mobile world it is possible to directly attach native ui to the virtual board instead of using some "protocol" here.

What do you mean by this? If the keyboard receives UI from another program, then they must have a protocol they use for communicating.

I also didn't mean for this protocol to be used for the customization of the keyboard, and not even necessarily related to a keyboard at all. It could be mouse gestures or an accessible device.

why text input keysym event is removed

I think that's actually a good argument (but not yet a convincing one), and I'm glad that you brought it up.

I removed the keysym event because applications don't care about it. They keyboard protocol doesn't send keysyms, so including one would mean opening a new path in the application. It's effectively the same as adding a new protocol.

I think keysyms would make sense, but they would also introduce ambiguity and possibly encourage bad practices. We do want this interface to be used for ctrl-Z, but not for "a", because there is commit_string for text already. A way out would be to restrict keysyms to those that perform actions, but then I think it's better to just create a new call anyway.

We already need to deal with the ambiguity for "a" key case. I have seen almost all that application that handles only keysym but not when commit a single char. I think that is totally valid thing to do from application.

I don't think enforcing virtual keyboard to always use commit_string is the right way to go.

Committing string "a" and Sending keysym "a" actually have two different meanings.

When people use prediction, it is certainly valid that a prediction have only one character. For example, input method may find "a", "an", "I" are suitable predication under current context. When that happens, commit string will force the semantics for "commit the string" and don't treat it as a key.

But when user click "virtual key a" on virtual keyboard, I think we should assume that user want to simulate key "a" and up to application to handle it as Shortcut Key, or commit the corresponding text.

Sure, that's why we have both keycodes and commit_string.

Here's the problem. Commonly application will need to use keymap to translate key code to key sym anyway otherwise what you get is purely a physical key on the keyboard. The keymap binding to the wl_keyboard has nothing to do with any "virtual keyboad" input method. What input method want to do it generate "press A", but right now what input need to do is "send key code that may generate A". keymap is never a data structure that designed to do reverse key look up.

Did you see the problem here? When we are forced to use keycode for input method, there need to be a keymap (but totally meaningless to input method) to do the reverse conversion in order to send the actual thing we want (keysym).

If you know the existing those im module interface on Linux, the communication between application and input method is almost always keysym based, not key code based. We can even forward some key that is not exist in current keymap (which means there is no such key code -> keysym mapping) to application and application will be totally fine with it.

Keycode/Keymap is quite meaningless concept here, especially for virtual keyboard (regular input method too, not so significant).

In this case, I'd rather application can have that old keysym interface (or whatever it's called in this proposal here, I do hope this new "action" protocol allows to send keysym without the requirement of keycode, but also some other application specific meaning actions like "Cut" "Copy" "Paste").

I understand your point better now. This is the most important part of what you wrote:

I think we should assume that user want to simulate key "a" and up to application to handle it as Shortcut Key, or commit the corresponding text.

Digging into it a bit more, we get the two intentions: the user presses the key "a" either because they want to "enter text a", or "trigger an action enhance".

When the user wants the text "a", then it's hopefully clear that pressing "a" will send text, and that a text_input.commit_string() will happen at some point. I won't elaborate.

When the user wants to trigger "Enhance", then they will look for a button ... called "Enhance". We get a contradiction, we assumed they find "a". But if we have a virtual keyboard on the screen, we can give it any shape. Why not name the button properly? Or why not let the app handle it completely?

The only case where this matters is when the app is created for physical keyboards, which cannot be reconfigured. If we don't have those, we don't need keysyms either.

The input-method family of protocols is not meant to emulate physical keyboards, and the keycode/keysym system that follows (except for the virtual-keyboard crutch). It's more directed at entering text at the moment, while simultaneously trying not to carry over keyboard-specificity (so you can use speech or sip-puff devices, or whatever comes).

I'm seeing a clear distinction between text and actions, and those protocols are not addressing the "actions" part, except for the "virtual-keyboard", which is stuck to physical keyboards, as you correctly observed. So this RFC is here because I don't know how to answer the question on my own: how do we trigger actions when we no longer have to stick to the history of physical keyboards?

To expand on the contradiction above, if the user presses "Enhance" when they intend to trigger "Enhance", this frees an overloaded action from "a", so "a" becomes unambiguously assigned to text. If the application deals with its actions itself, the input method's only responsibility becomes to provide text. No keysyms required.

There's a snag though, in context-sensitive actions. I started with text field context here, because things like "select all", "one char right", "next text field", "submit", "undo" are not universally presented by the applications in this context. Those are the actions that inspired me to write this RFC, even if I didn't realize their importance before this exchange :)

There's a snag though, in context-sensitive actions. I started with text field context here, because things like "select all", "one char right", "next text field", "submit", "undo" are not universally presented by the applications in this context.

Can you explain this thought some more? I'm not following how the actions "are not universally presented [...] in this context" and what universal presentation in general means here.

It's been a long time, so I might not follow the same line of thought as before. Nevertheless, this is how I see it now:

Applications don't universally expose the functionality of context-sensitive actions in a discoverable way. Testing with Firefox, if I right-click this input field, I get "select all", "undo", but not "redo" (ctrl+shift+z). When a keyboard is unavailable, "redo" is still implemented but not reachable. That's why this protocol should take it over and provide uniformity across applications.

I'm not sure what I meant by the importance of context-sensitive actions, except that my current position is that those are the only ones that should initially be in such a protocol. Global actions are usually specific to the application ("Enhance"), or not possible to apply to all applications ("Print"), while context-related actions are much more uniform.

Yes, makes sense. When I first heard about "actions" I had the worry that this would become an endless list or many intentions with nuances not be represented correctly.

But if we restrict ourselves to a specific domain/context, here text input, then the list of actions can be easier agreed upon.

mentioned in issue #39

marked this issue as related to #39

mentioned in merge request !19

Should such an actions protocol communicate whether an action is currently available or not dynamically?

A mobile on-screen keyboard could show undo/redo or directional navigation buttons greyed out if it will no longer have any effect.

Since needing to update the protocol version for the compositor every time new actions are created is not great, maybe the protocol could specify that unknown actions are to be passed through transparently.

The text-input client would communicate all the actions that it supports through the enum values, and the compositor would pass those through to the input method, which would only use the ones that it both knows about and the client supports.

mentioned in merge request !73

As a somewhat related topic, general "actions" accessed via keyboard shortcuts are problematic. Perhaps a similar principle could prevent carrying this baggage on to touch-first devices.

Here's an article about the problems of keyboard shortcuts: https://tkainrad.dev/posts/why-keyboard-shortcuts-dont-work-on-non-us-keyboard-layouts-and-how-to-fix-it/ . HN comments are adding to it too: https://news.ycombinator.com/item?id=26743028

RFC: Making it possible to use text-input without keyboard events

Child items ...

Activity

Admin message

Admin message

RFC: Making it possible to use text-input without keyboard events

Activity