Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
When designing the text-input protocol, the initial use case was to use it as a mirror of what virtual keyboards on Android do, but also to make it possible to use as a standalone assistive technology.
Currently, the text-input protocol cannot even in the best case be used without a keyboard or mouse for common things that other OS can do. Android keyboards have the "submit" button, which can receive some hints from the application. Text-input compositors must emulate that with a "return" keycode, dropping some of its semantics: "submit" is not the same as "move to the next line", but "return" can mean either.
Another common action is "undo"/"redo", which is usually supported in text fields, but not indicated in any way by applications. The key presses to emulate it will change meaning between applications, making it hard to emulate.
While I'm not sure whether such a scale is needed, something to the same effect could solve some problems:
using keyboard emulation just for submitting is an overkill, especially for input-method protocol clients which would have to support something like virtual-keyboard, which has some problems that are irrelevant to this case
the interface could ask clients if relevant features are allowed (undo is not universal, submit isn't always valid)
While this could be a standalone protocol, it would have to add new enter/leave events, or attach to the existing ones, so I would suggest this to become part of a larger text-input protocol update, starting with basic actions only.
Does this make sense as a protocol, or should text-input compositors rather stick to keyboards explicitly?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
When we talk about this, we should review what's already available on different platform, especially for cross platform toolkits and see how they expose it via their API. Without toolkit adoption, the protocol change would be simply useless and marked with TODO/unimplemented.
Force using key event or not is probably not a key point here.
So unfortunately, even though we might want to ditch away from key event, the toolkit itself might still need to do key simulation to implement the feature in the toolkit.
But I'm not saying adding a custom action other than key is not preferred, but xf86 key sym already has tons of actions and is able to covers a large set of standard actions, like Undo and Redo is in key sym 0xff65 and 0xff66.
And there are still a large unused enum space can be used. So probably extending Keysym would be a better option instead of adding another burden on toolkit to expose a new API for solely actions.
But action key hint is probably a good idea to have. As we already have a CONTENT_HINT and CONTENT_PURPOSE, add another ACTION_KEY_HINT seems to be useful.
One of the main reasons I proposed this is that keysym simulation between the input method and the application is controversial and it's not certain if it will be adopted. See discussion under virtual-keyboard.
Without virtual-keyboard we have no choice but to support a separate protocol (I'm ignoring whether the toolkit translates this into key events internally, not relevant to protocols). This automatically eliminates XFree86 codes, except for inspiration.
My question would be: which codes do we want to support? And what to do to have the protocol widely adopted?
Android's action button hints are cool, but I think much more utility is in supporting many different buttons, as opposed to different shapes for one button. Imagine text fields offering no undo/redo support and advertising the fact, or fields which are not submittable (streaming?).
Customize keyboard is one thing, in mobile world it is possible to directly attach native ui to the virtual board instead of using some "protocol" here. But we might want need to stick to a protocol for virtual keyboard customization.
Sending special event is another thing. Personally I don't understand why text input keysym event is removed and now we need to use another newly invented protocol to simulate that again. Pretty much all app cares more about keysym instead of keycode. Even if we need to keep same interface, I'd rather to make it totally valid for input method to send a (keysym=sth, keycode=0, state=sth) event. So we don't need to care about any "reverse" conversion.
Here's the reason why keysyms have been removed: keycodes require no special support on the receiving side. On the other hand, keysyms can't be sent via the wl_keyboard interface.
Having both keycodes and keysyms is unnecessary since you can easily get keysyms from a keycode.
This is not the case for virtual keyboard. For virtual keyboard, the "keymap" thing is meaningless. Why would virtual keyboard care about the physical keyboard keymap? Suppose the system is under qwerty and virtual keyboard is displaying a azerty. How do virtual keyboard send a key "A"?
Under current case it has to learn that physical keymap, reverse from A to the keycode under qwerty and send a different code.
No, currently the virtual keyboard has a separate keymap which is sent to clients. There's no keycode conversion going on, keycodes flow directly from the input-method to the clients.
Customize keyboard is one thing, in mobile world it is possible to directly attach native ui to the virtual board instead of using some "protocol" here.
What do you mean by this? If the keyboard receives UI from another program, then they must have a protocol they use for communicating.
I also didn't mean for this protocol to be used for the customization of the keyboard, and not even necessarily related to a keyboard at all. It could be mouse gestures or an accessible device.
why text input keysym event is removed
I think that's actually a good argument (but not yet a convincing one), and I'm glad that you brought it up.
I removed the keysym event because applications don't care about it. They keyboard protocol doesn't send keysyms, so including one would mean opening a new path in the application. It's effectively the same as adding a new protocol.
I think keysyms would make sense, but they would also introduce ambiguity and possibly encourage bad practices. We do want this interface to be used for ctrl-Z, but not for "a", because there is commit_string for text already. A way out would be to restrict keysyms to those that perform actions, but then I think it's better to just create a new call anyway.
We already need to deal with the ambiguity for "a" key case. I have seen almost all that application that handles only keysym but not when commit a single char. I think that is totally valid thing to do from application.
I don't think enforcing virtual keyboard to always use commit_string is the right way to go.
Committing string "a" and Sending keysym "a" actually have two different meanings.
When people use prediction, it is certainly valid that a prediction have only one character. For example, input method may find "a", "an", "I" are suitable predication under current context. When that happens, commit string will force the semantics for "commit the string" and don't treat it as a key.
But when user click "virtual key a" on virtual keyboard, I think we should assume that user want to simulate key "a" and up to application to handle it as Shortcut Key, or commit the corresponding text.
Here's the problem. Commonly application will need to use keymap to translate key code to key sym anyway otherwise what you get is purely a physical key on the keyboard. The keymap binding to the wl_keyboard has nothing to do with any "virtual keyboad" input method. What input method want to do it generate "press A", but right now what input need to do is "send key code that may generate A". keymap is never a data structure that designed to do reverse key look up.
Did you see the problem here? When we are forced to use keycode for input method, there need to be a keymap (but totally meaningless to input method) to do the reverse conversion in order to send the actual thing we want (keysym).
If you know the existing those im module interface on Linux, the communication between application and input method is almost always keysym based, not key code based. We can even forward some key that is not exist in current keymap (which means there is no such key code -> keysym mapping) to application and application will be totally fine with it.
Keycode/Keymap is quite meaningless concept here, especially for virtual keyboard (regular input method too, not so significant).
In this case, I'd rather application can have that old keysym interface (or whatever it's called in this proposal here, I do hope this new "action" protocol allows to send keysym without the requirement of keycode, but also some other application specific meaning actions like "Cut" "Copy" "Paste").
I understand your point better now. This is the most important part of what you wrote:
I think we should assume that user want to simulate key "a" and up to application to handle it as Shortcut Key, or commit the corresponding text.
Digging into it a bit more, we get the two intentions: the user presses the key "a" either because they want to "enter text a", or "trigger an action enhance".
When the user wants the text "a", then it's hopefully clear that pressing "a" will send text, and that a text_input.commit_string() will happen at some point. I won't elaborate.
When the user wants to trigger "Enhance", then they will look for a button ... called "Enhance". We get a contradiction, we assumed they find "a". But if we have a virtual keyboard on the screen, we can give it any shape. Why not name the button properly? Or why not let the app handle it completely?
The only case where this matters is when the app is created for physical keyboards, which cannot be reconfigured. If we don't have those, we don't need keysyms either.
The input-method family of protocols is not meant to emulate physical keyboards, and the keycode/keysym system that follows (except for the virtual-keyboard crutch). It's more directed at entering text at the moment, while simultaneously trying not to carry over keyboard-specificity (so you can use speech or sip-puff devices, or whatever comes).
I'm seeing a clear distinction between text and actions, and those protocols are not addressing the "actions" part, except for the "virtual-keyboard", which is stuck to physical keyboards, as you correctly observed. So this RFC is here because I don't know how to answer the question on my own: how do we trigger actions when we no longer have to stick to the history of physical keyboards?
To expand on the contradiction above, if the user presses "Enhance" when they intend to trigger "Enhance", this frees an overloaded action from "a", so "a" becomes unambiguously assigned to text. If the application deals with its actions itself, the input method's only responsibility becomes to provide text. No keysyms required.
There's a snag though, in context-sensitive actions. I started with text field context here, because things like "select all", "one char right", "next text field", "submit", "undo" are not universally presented by the applications in this context. Those are the actions that inspired me to write this RFC, even if I didn't realize their importance before this exchange :)
There's a snag though, in context-sensitive actions. I started with text field context here, because things like "select all", "one char right", "next text field", "submit", "undo" are not universally presented by the applications in this context.
Can you explain this thought some more? I'm not following how the actions "are not universally presented [...] in this context" and what universal presentation in general means here.
It's been a long time, so I might not follow the same line of thought as before. Nevertheless, this is how I see it now:
Applications don't universally expose the functionality of context-sensitive actions in a discoverable way. Testing with Firefox, if I right-click this input field, I get "select all", "undo", but not "redo" (ctrl+shift+z). When a keyboard is unavailable, "redo" is still implemented but not reachable. That's why this protocol should take it over and provide uniformity across applications.
I'm not sure what I meant by the importance of context-sensitive actions, except that my current position is that those are the only ones that should initially be in such a protocol. Global actions are usually specific to the application ("Enhance"), or not possible to apply to all applications ("Print"), while context-related actions are much more uniform.
Yes, makes sense. When I first heard about "actions" I had the worry that this would become an endless list or many intentions with nuances not be represented correctly.
But if we restrict ourselves to a specific domain/context, here text input, then the list of actions can be easier agreed upon.
Since needing to update the protocol version for the compositor every time new actions are created is not great, maybe the protocol could specify that unknown actions are to be passed through transparently.
The text-input client would communicate all the actions that it supports through the enum values, and the compositor would pass those through to the input method, which would only use the ones that it both knows about and the client supports.
As a somewhat related topic, general "actions" accessed via keyboard shortcuts are problematic. Perhaps a similar principle could prevent carrying this baggage on to touch-first devices.