UI/text scaling information for toolkits

changed the description

Qt used to automatically compute the DPI to decide which font size to use, however they stopped doing it because it wasn't working properly on some compositors/setups.

Clients computing their own DPI from physical dimension, resolution etc, is not a reasonable way forward for this, it should be something configured by the compositor. Reasons for this is include that EDID tend to not provide reliable information about physical dimensions, and for some concepts, e.g. remote screens, projectors, (VR?), the information may not be available at all, or even changing depending on the environment.

I agree. Something configured in the compositor also allows users to customize it.

UI scale

I would suggest that this setting needs to be a global and not a per-output thing. If it was a per-output thing, windows moving between different monitors would probably have problems (text size changes while window size does not?). It is hard to imagine how it would work as a per-output setting, for me at least.

Another thing I would propose that this is something the compositor would not compensate for, when the client used UI scale differs from the compositor setting. If a compositor compensated for it, like it does with output/buffer integer scale, it would just lead to the same image quality problems as with fractional scaling today.

Fractional scaling in clients

Fractional scaling is a whole another topic apart from UI scale. I want to mention it to avoid confusing the two.

If a goal would be to let clients render directly in fractional scales, we already have wp_viewport that can do the client->compositor communication but compositor->client part is missing. This is different from UI scale, because it is effectively a fractional buffer scale, without changing the UI size.

I would suggest that this setting needs to be a global and not a per-output thing

I tend to disagree with this point. For example it's super convenient to connect hiDPI laptop to old monitor: laptop's screen had 150% zoom and monitor 100%. Dragging windows between them (Win10) resizes them, which is supported by Chrome, Win32, UWP and other toolkits.

Yes, KDE Plasma supports that use case and we'd like to be able to continue doing it.

UI scale

This sounds a lot like text-scaling-factor in GNOME, used for implementing the "Large text" accessibility feature. The only way this involves the compositor is that it affects how it itself renders text.

Right. I think one of the points here is to get everyone behind a single spec.

UI scale

So, we'd separate user size preference (e.g. increased size to make things more readable, decreased size to save space) from hardware adjustments (e.g. matching multi-dpi setups).

A given window would then draw based on ui_scale * output_scale, with the compositor compensating for output scale when in between two monitors of different DPI. If support for fractional scaling was present, the same result could be obtained by pre-multiplying the UI scale into the existing output scales.

This does slightly redefine the purpose of output scale. I wonder if this would be of use to those interested in physical size representation (i.e. rendering according to the true monitor DPI).

Another thing I would propose that this is something the compositor would not compensate for

Hmm, we would probably have some problems if we cannot compensate for lack of support for a UI scale protocol. E.g., if UI size is increased to assist with visibility, unsupported applications might be too small to be legible.

Fractional scaling

The things we need here is:

A way to communicate scale/native window size to the client
Restrictions in the compositor to ensure that physical pixel alignment can be achieved (this will require changes to compositors that just scale their output buffers)
A wp_viewport special case that will enable 1:1 pixel copy
Solutions to the many headaches I imagine we will run into (e.g., aligning scaled and unscaled subsurfaces side-by-side).

The first step as previously discussed with @jadahl would be a demo implementation that handles this for a single fullscreen surface, as this largely sidesteps the problems of item 2. This is merely waiting on the necessary spare time being available for experimentation.

This makes it hard to achieve aesthetically pleasing desktop look and leads to horrible hacks like fractional scaling that ruin any chance for crisp text rendering.

I wouldn't say fractional scaling is what ruins crisp text rendering. Font renderers are sophisticated engines that have lots of tricks to accomodate text into grids of pixels of more or less arbitrary sizes, and have been doing that for years. A great number of physical grids available these days aren't high resolution enough as to not require some of those tricks, e.g. subpixel rendering and hinting. In any case, the problem is an implementation of fractional scaling that first command the renderer to produce output taylored to certain raster and then scales it to a smaller one. At enough pixel density, this is a matter of indifference, but there are way too many FHD laptops out there.

Solutions to the many headaches I imagine we will run into (e.g., aligning scaled and unscaled subsurfaces side-by-side).

Could you explain why this may happen? If the client is communicated a fractional scale but is constrained to produce integer-sized surfaces what will be the problem? I remember some arguments for constraining the scaling factors in GNOME in order to produce neatly aligning output buffers, but as I understand it this was because the compositor was doing the downscaling from integer-sized to integer-sized surface. When the client is in charge of producing a grid of NxM points at whatever scale, why would that be a problem for the compositor? The matter is of interest to me because it was discussed recently regarding the possibility of allowing the Plasma compositor to get fractionally scaled surfaces from Qt apps.

I wouldn't say fractional scaling is what ruins crisp text rendering. Font renderers are sophisticated engines that have lots of tricks to accomodate text into grids of pixels of more or less arbitrary sizes, and have been doing that for years.

The problem is compositors lying to applications because there is no way in protocol to communicate actual pixel density except the crude "scale factor" and then rescaling the resulting bitmaps.

I wouldn't say fractional scaling is what ruins crisp text rendering.

The problem is when rendered bitmaps need to be scaled, as is currently necessary. Any vector graphic rasterization, including fonts, would of course be "crisp" if it can operate on native pixels.

Could you explain why this may happen? .. When the client is in charge of producing a grid of NxM points at whatever scale, why would that be a problem for the compositor?

Scale is defined per surface, and a logical "window" can be comprised of multiple surfaces, as is commonly the case (e.g. CSD features being subsurfaces positioned adjacent to a main surface).

Let's make an example: Imagine a case with 3 subsurfaces, each 3px wide, placed at (0,0), (3,0) and (6,0), so they are perfectly adjacent with no gap.

If you apply a 1.5x scale, these surfaces become 4.5px wide. While logical pixel positions are unchanged, the native pixel positions are now (0,0), (4.5, 0), (9, 0). The middle surface is now half a pixel out of alignment, and both sides requires blending with the adjacent surface.

Things are now rather bad, with nothing matching native pixels. Let's try to fix that for the middle one using fractional scaling awareness. But, this requires native pixel alignment, so now we must both shift and grow the surface.

If we shift it half a pixel to the right, we're breaking the adjacency with the left subsurface, while overlapping a full pixel with the right subsurface. If we move it half a pixel to the left, we're overlapping with the left subsurface.

Neither option will render correctly.

It should be noted that this already assumes that general window position, (0, 0), is aligned to native pixel boundaries, which is not currently the case for compositors supporting fractional scaling. None of this is an issue under integer scaling.

I think the whole point here was that you can communicate physical pixels and how many (possibly fractional) logical pixels are in the physical pixels. That way the compositor always has non-fractional pixel coordinates. How to fit/align the logical and physical pixels becomes a client problem which should be better equipped to handle it in the first place.

The problem is compositors lying to applications because there is no way in protocol to communicate actual pixel density

I think the whole point here was that you can communicate physical pixels and how many (possibly fractional) logical pixels are in the physical pixels.

The problem is when rendered bitmaps need to be scaled, as is currently necessary.

Yes, yes and yes. That's what I was trying to say, sorry if I wasn't able to convey it. I just wanted to differentiate the general idea of "fractional scaling" from the particular upscaling-downscaling implementation, because there are different kinds of critiques and sometimes they're conflated. For anything designed for a fixed grid of pixels at certain resolution, say the mythical 96 DPI, it's true that there is no general "pixel perfect" way to transform the output so that it fits another grid of arbitrary size and resolution, therefore some people argue for screens that have a natural integer scale as the solution and see fractional scaling as sort of a lie, unscrupulous vendors and naive users, that is they mainly blame the current batch of screens that require scaling factors of 1.5, 2.5, etc. (See for example this article and the System 76 links posted there). I don't agree because I don't think we should ditch an entire generation of hardware that is rather capable after all, but I see the point. I'm not that worried about "pixel perfect" though but about crisp fonts, because I mainly work with text, and font renderers are designed to work reasonably well at arbitrary scales (and so are SVG renderers and entire engines like the ones in browsers, some toolkits, etc.). So then we have this other more roundabout client-int-upscale/server-float-downscale technique to get a fractional scaled version of a buffer, that is currently kind of forced by the Wayland standard, which bluntly filters the output of the font renderer and others. The resulting distortions are noticeable at typical resolutions of ~150 DPI, for example the same character in the same paragraph might get very different renderings at different positions because of bilinear interpolation, this is apparent in frequent sequences like "ll" or "rr". And the more the font renderer tries to fit the original grid (by doing hinting, subpixel rendering, antialiasing, etc) the worse the distortion. This is another kind of critique with which I do agree. To this critique it should be added that the increased computational requirements are not minor. For a screen of HxW at a fractional scale q, you need to produce (H x 2/q)x(W x 2/q) pixels, that is a factor of (2/q)^2 extra pixels, considering scaling factors of 150% or 125% that are natural for current FHD laptops, that means about twice or thrice the original load (~1.8 and ~2.6), without taking into account the downscaling step. The "works for Macs" argument works for Macs because they are designed for scales of 200% (which was the default for many years) or maybe 175% (which is the default now, probably in order to increase "real state") and have > 200 DPI, but it's not that ok for an average non-Apple laptop because the intermediate raster is way larger than the physical raster (because of the lower q) and the output resolution is low enough to reveal the artifacts of downscaling.

The above is just an elaboration as requested on my point 4 from "things we need": "Solutions to the many headaches I imagine we will run into (e.g., aligning scaled and unscaled subsurfaces side-by-side).".

I think the whole point here was that you can communicate physical pixels and how many (possibly fractional) logical pixels are in the physical pixels.

The point is to allow a surface:

To be pixel-aligned, which the compositor must guarantee
To know the final physical size in pixels (which includes a few issues of its own - the client normally dictates its own size using a scaling factor, but we might want to avoid using a factor here)
To be able to provide a buffer that will be copied 1:1 to the display

It does not affect non-scaled surfaces, and scaling is not a client-wide property. Scaled and unscaled surfaces will need to behave in a well-defined manner, side-by-side in the same application. This leads to the above issue. And the above was just one example of a positioning problem violating alignment with otherwise valid constructs.

Long story short, it's much more complicated than just making the scaling factor a fraction.

Maybe I just don't get what you're trying to say but in my mind this whole thing is pretty simple.

There is two important facts:

there will be cases in which the grid of a surface will not align with the grid of the compositor
we must be able to align (sub)surfaces

One solution to this is to map all surfaces to a rational space instead of the natural space/a grid in the compositor (practically just floats) and in the rendering step map that to the grid: for every pixel you sample only from the surface with the highest contribution. Whenever a surface which matches (a multiple of) the grid you use the nearest filter and it just snaps to the grid. Everything else is getting scaled anyway with a filter where you sample from multiple pixels.

in other words: the trick is to map things to the real pixel grid in the very last step.

There are cases where logical alignment conflicts with physical alignment requirements. The case I presented does not have a valid result - any final pixel value will violate some requirement from the application, regardless of how it was obtained.

I still don't understand. As soon as you allow fractional scaling things will not always align and you won't achieve a "pixel perfect" copy. Is that already an invalid result for you? What requirements are you talking about?

The work we've been discussing for fractional scale is only with respect to obtaining pixel-perfect copy as supported on other platforms, which is non-trivial.

If you do not care for pixel perfect copy, just use or improve the various compositors' existing fractional scaling support. This requires no additional protocol support - it's just a scaling operation done on the final output buffer.

What I described gives you pixel perfect copies for all cases where it is possible. When the surface grid is not a multiple of the output grid pixel perfect copies are not possible. Not on wayland, not anywhere else.

Of course not, but our 1:1 content needs to coexist with scaled content and needs to be well defined, following all guarantees made by relevant specifications around final presentation when doing so.

The above outlines how a 1:1 surface would result in destructive interaction with other surfaces in a way that breaks our guarantees during currently well-defined client requests.

It is not a question of implementation, it is a question of defining behavior in a way that does not break existing protocols, existing usage or developer/user expectation. We cannot just do whatever in the name of pixel-imperfect rendering.

I feel like we have spent far too much effort at this point discussing what is just an example problem for the last, post-implementation task.

The above outlines how a 1:1 surface would result in destructive interaction with other surfaces in a way that breaks our guarantees during currently well-defined client requests.

You're assuming compositor behavior that is not defined in any protocol and I've shown compositor behavior which would not result in the problem and doesn't break existing protocols.

I feel like we have spent far too much effort at this point discussing what is just an example problem for the last, post-implementation task.

If communicating a fractional scaling factor is not enough for the compositor and clients to do the "right thing" then it is important to know what exactly the problem is so we can design the protocol in a way which allows them to do the "right thing". You argue that we need more things than the fractional scaling factor but I have yet to see a convincing argument to why that is.

You're assuming compositor behavior that is not defined in any protocol

Subsurface positioning is defined, otherwise it would be useless as application developers would have no idea where and how subsurfaces would render.

and I've shown compositor behavior which would not result in the problem and doesn't break existing protocols.

I'm sorry, but your presented solution does not solve the presented problems. For the example problem, it will arrive at one of the two possible results I presented (shifting the middle surface either left or right to "snap to grid"), with which depending on its behavior in the edge-case where a physical pixel has equal contribution from two logical pixels.

Whether the mapping to physical pixels occur early with manual logic for alignment, or late with alignment being based on pixel value sampling will not affect the final rendering. It's just an implementation detail. I find the former to be simpler to fit into existing compositors, but other compositors are free to pick your presented model.

If communicating a fractional scaling factor is not enough for the compositor and clients to do the "right thing" then it is important to know what exactly the problem is so we can design the protocol in a way which allows them to do the "right thing".

The prototype implementation to showcase the concept is before any protocol details are finalized. The last task, where this is included, is a matter of analyzing, enumerating, and designing a solution for these quirks that will be discovered when real-world applications are made subject to this new behavior.

What I listed is only an example of something we will have to observe and design a solution for, and does not affect prototype work. How a prototype deals with this problem is inconsequential, and it therefore does not need to be dealt with at this stage.

You argue that we need more things than the fractional scaling factor but I have yet to see a convincing argument to why that is.

I previously outlined the things that would be required for the wp_viewporter-based approach that has been discussed in #wayland.

This thread is only discussing part four, which is the catch-all for finding quirks and problems we need to deal with. Not only that, it's only discussing a single example from this point.

For the example problem, it will arrive at one of the two possible results I presented (shifting the middle surface either left or right to "snap to grid"), with which depending on its behavior in the edge-case where a physical pixel has equal contribution from two logical pixels.

I'm running out of ways to say it: there is no problem. What you describe only ever happens when you have non-integer scaling. There is literally no right way to scale things. Every single pixel will be an approximation so why do you care that the pixel at border is also an approximation? It also doesn't affect (sub)surface alignments where you can do pixel perfect copies.

This thread is only discussing part four, which is the catch-all for finding quirks and problems we need to deal with. Not only that, it's only discussing a single example from this point.

Sure but if you follow a simple implementation idea then what your example is not a problem at all.

I'm running out of ways to say it

This is not a constructive tone for a discussion.

Every single pixel will be an approximation so why do you care that the pixel at border is also an approximation?

For a very good reason: It's currently always an accurate representation of the input, even when the grids are mismatched due to e.g. 1.5x scaling applied to the output. This will no longer be the case with fractional scaling aware client rendering.

Why alignment is a problem

Up until now, there is no situation where window geometry or positioning is inconsistent with the request made by the application. This is regardless of integer scaling, native hidpi rendering, fractional scaling, subsurface interactions, or any other such construct.

Fractional scaling will of course lead to a blurry image from the grid mismatch, but it is at all times an accurate (insert disclaimer about yet-to-be-completed color-space work) rendition, consistent with the input data (even if the process is lossy), and no relationships between surfaces are broken. All logical pixels contribute to all physical pixels, in a way proportional to their overlap.

This falls apart with fractional scaling awareness due to the previously mentioned problem with alignment shifting and surfaces at different scales, which includes the "grid snap" that is implicitly done by the suggested renderer implementation. The rendition is not accurate and breaks relationships between surfaces. Aligning over a sibling surface will lead to logical pixels that do not contribute to any physical pixel, and aligning away from it will lead to a physical pixel that do not get the correct number of logical contributions.

This can lead to annoying visual artifacts. If you align away from a solid color surface in the 1.5x scaling example, you will end up with a 1px wide line that has half the intensity as it lacks logical pixel contributions. Aligning towards would be a problem for gradients, where dropping the last pixel might lead to a jarring transition.

... Therefore, it is a problem. How big or common the problem is is secondary.

The problem is not a matter of finding a magical implementation solution that somehow does the impossible, but about specification and guarantees, some of which we are currently conflicting with. While there are ways to deal with it, none are very nice.

Sure but if you follow a simple implementation idea then what your example is not a problem at all.

The suggestion for a renderer implementation, while a fine suggestion for dealing with mixed scaled and unscaled content, does nothing in the way of solving nor avoiding this issue.

It is merely one of many ways of obtaining the discussed output, with the final rendition from the renderer implementation already being covered by my initial description of the issue, which will contain the described misrepresentation of content.

And of course, pretending the issue does not exist does not make it go away either.

This is not a constructive tone for a discussion.

Sorry, I was genuinely struggling to find a way to express myself.

You keep repeating that scaling is an "accurate representation" or "accurate rendition". It's not. Nobody computes the contribution of overlapping pixels accurately but instead samples specific points depending on the filter being used. (Technically you could really calculate the actual overlap but nobody does that because it's expensive). The whole premise that you can scale and have a accurate representation is simply wrong.

This falls apart with fractional scaling awareness due to the previously mentioned problem with alignment shifting and surfaces at different scales, which includes the "grid snap" that is implicitly done by the suggested renderer implementation. The rendition is not accurate and breaks relationships between surfaces.

What relationship between surfaces? I've always assumed you mean aligned surfaces here, is that correct?

As long as the rule for snapping is consistent there won't be gaps, all matching grid surfaces will get their pixel perfect copy (and thus their actual size), the size will be off by a maximum of 0.5 pixels on surfaces which already don't have accurate representation.

Are you really arguing that the 0.5 pixels inaccuracy of an already inaccurate representation is somehow a problem?

This can lead to annoying visual artifacts. If you align away from a solid color surface in the 1.5x scaling example, you will end up with a 1px wide line that has half the intensity as it lacks logical pixel contributions.

Sampling from outside of a texture can happen when scaling even if the border happens to be aligned. You basically extend the border pixels infinitely and sample from that. So as long as you don't have a pattern with a frequency of 1/2px this won't be noticeable. Similarly there is patterns which get completely destroyed by specific scaling methods. All of those things are inherent signal theory limitations.

The whole premise that you can scale and have a accurate representation is simply wrong.

I think this is stemming from miscommunication between us. Specifically, I am trying to convey something different when I say "accurate representation" - position and size. I am not talking about a lossless, reversible operation, moiré, or other fun signal theory things. There's a big difference between filters that cause interference patterns, and just flat out shifting and resizing individual surfaces differently from everything else.

Instead of ending up discussing definitions, let's instead just define what I am referring to: If you place two 1px logical pixels beside each other, they should be equally wide and adjacent. At 1.5x, they should be 3px wide and be blended in the middle. From a geometry perspective, that's as accurate as it's going to get. A very inaccurate representation would be 1px + 2px or vice-versa, which is a pretty large deviation from the original input, with positions and sizes having changed.

Next section has an example of why I think 0.5px could be a problem, but note that 1.25x is a very common scaling factor these days as well, in which case the error is 0.75 - for all intents and purposes a whole pixel. I only picked 1.5x in the example for simplicity.

What relationship between surfaces? I've always assumed you mean aligned surfaces here, is that correct?

Yes. My concern primarily lies around surfaces meant to be entirely adjacent where some surfaces then end up shifted for alignment, and the visual effects this will cause.

It would be rather annoying if two adjacent, opaque black surfaces, with one aligned away from the other to allow native rendering, ended up growing a grey stripe down the middle from a scenario that is otherwise well-defined. The prototype would make it easier to discover these kinds of annoyances - the thing we are discussing now was just a random one I thought up as an example of things we could find!

And yes, I know that your suggested renderer implementation would handle this particular scenario in an acceptable manner - but the implementation is up to the compositors. The goal is to have a specification that makes sense to application developers and compositor writers with no surprises, and without making too much of a mess of things.

Are you really arguing that the 0.5 pixels inaccuracy .. is somehow a problem?

It's a problem, yes, but I never made it out to be a world-ending show-stopper.

I only even mentioned it in passing, in a parenthesis, as an example of something we might bump into. I didn't even intend to put more thought into it yet. My goal was merely to enumerate these problems so we could consider them and pick the best course of action from a specification point of view, and not have annoying surprises later on for client or compositor authors.

There's a big difference between filters that cause interference patterns, and just flat out shifting and resizing individual surfaces differently from everything else.

Is there? I guess this is the fundamental disagreement here.

Next section has an example of why I think 0.5px could be a problem, but note that 1.25x is a very common scaling factor these days as well, in which case the error is 0.75 - for all intents and purposes a whole pixel. I only picked 1.5x in the example for simplicity.

The shifting and resizing happens for never more than 0.5 pixels. The error would be 0.25 in this case.

Anyway, thanks for explaining.

The shifting and resizing happens for never more than 0.5 pixels. The error would be 0.25 in this case.

Hmm, yeah on second thought I think you're right. I forgot what the setup was in my head for needing to cause 0.75px error, but maybe I had just gotten myself confused.

Scale is defined per surface, and a logical "window" can be comprised of multiple surfaces, as is commonly the case (e.g. CSD features being subsurfaces positioned adjacent to a main surface).

Ok, thanks! I think I understand the problem. So for the compositor it's not all just mainly independent "black boxes" representing clients, but subsurfaces inside surfaces which relative positions have to be preserved somehow after scaling, so the problem of rendering at a fractional scale is not entirely encapsulated at the client side but leaks through to the compositor, even if top-level windows were all integer-sized. Am I at least approximately right?

Exactly. :)

mentioned in issue xorg/xserver#704

Was any progress made here? AFAICT it seems some options for compositors here are:

Do nothing, and just add a protocol to tell clients to use whatever scaling/rounding algorithms the compositor is using for surface-local coordinates
Same as above but allow the client to specify the rounding it wants
Enforce a certain rounding mode in the spec, always send the final real screen coordinates to subsurfaces, tell clients to deal with the resulting inaccuracy
When using unscaled surfaces, require subsurfaces (and input events, regions, damage rects...) to always use screen coordinates instead of logical coordinates
Add another protocol for subsurfaces (and input events, regions, damage rects...) to use float or rational coordinates, then snap it to pixels when doing the final composite
Don't support subsurfaces within unscaled surfaces at all

mentioned in issue #47 (closed)

Hi I came up with an idea in #47 (comment 1152232) and want to see if you guys especially @kennylevinsen think it is a viable solution:

Before scaling a surface, check its edge positions and see if any edges are going to be subpixel-located.
If there is, add transparent pixels padding to that surface edge(s) until the new coordinate will be pixel-aligned with the physical device.
Do the scaling.
The scaled surface shall now have an integer position coordinate in device scale and also an integer width and height in device scale.
Do the composition normally with others. If 2 surfaces touch each other before scaling and their alignment position is at subpixel location in physical device, the result shall be 2 semi-transparent edges overlap and blend in with one another nicely together.
For mouse interaction for any surfaces with non-native scale, compute the mouse coordinate in client scale for them. A mouse cursor that visually look like it is positioned at that overlapped semi-transparent edge will only trigger hover detection of one of the two surfaces but not both.

In such scheme, there will be no rounding and no problems brought by rounding. There will be no non-integer coordinates either, only multiple mouse coordinates, one for each surface scales to interact with.

I figured out 50% opacity pixel overlapped with 50% opacity pixel don't yield 100% opaque pixel but only 75%... So here is a refinement of above proposal.

Delay the scaling. Group surfaces in adjacent z-order that are also in the same scale. Compose them in their native scale first. Then scale the result.

It is still not perfect but I think it is already miles ahead then what other OS currently do in terms of accuracy. When legacy programs are far and between, it will be good enough. When all programs are legacy, it is also good enough.

I don't want to spam in the issues thread, but I just want to say this issue needs more attention and needs to be resolved soon. It just makes Wayland not usable for lots of people. It feeds the discussion if Wayland is the future and make users stick to x11. Is anyone actively working on a solution?

mentioned in issue wayland#284

mentioned in merge request !143 (merged)

UI/text scaling information for toolkits

Child items ...

Activity

UI scale

Fractional scaling in clients

Why alignment is a problem

Admin message

Admin message

UI/text scaling information for toolkits

Activity

UI scale

Fractional scaling in clients

Why alignment is a problem