Commit a19c35a8 authored by Pekka Paalanen's avatar Pekka Paalanen
Browse files

A Pixel's Color



Add a story on what one would need to know to process and display a
pixel.

This is an introduction to people who are already familiar with computer
graphics in general, images in memory, and maybe window systems, but
never really thought what the values in a pixel actually mean or what
they are doing wrong with them.
Signed-off-by: Pekka Paalanen's avatarPekka Paalanen <pekka.paalanen@collabora.com>
parent 1ce9c034
Pipeline #469443 passed with stage
in 16 seconds
......@@ -14,8 +14,8 @@ specifications. All content must follow the [license](LICENSE).
- [Wayland Color Management and HDR Design Goals](doc/design_goals.rst)
describes the expectations and use cases that Wayland should meet.
- [Color Pipeline Overview](doc/winsys_color_pipeline.rst) compares the
X11 and Wayland color pipelines, and explains how a Wayland
- [Color Pipeline Overview](doc/winsys_color_pipeline.rst) compares the
X11 and Wayland color pipelines, and explains how a Wayland
compositor relates to display calibration.
- [Well-Known EOTFs, chromaticities and whitepoints](doc/well_known.rst)
......@@ -27,6 +27,9 @@ specifications. All content must follow the [license](LICENSE).
- [Todo](doc/todo.rst) contains random notes.
- [A Pixel's Color](doc/pixels_color.md) is an introduction to
understanding pixel values and color.
## History
Originally this documentation was started to better explain how
......
% SPDX-FileCopyrightText: 2021 Collabora, Ltd.
% SPDX-License-Identifier: MIT
% This is an Octave script: https://www.gnu.org/software/octave/
s = 0.04045;
e = [0 : 0.01 : 1];
o = zeros(size(e));
mask = e <= s;
o(mask) = e(mask) ./ 12.92;
o(~mask) = realpow((e(~mask) + 0.055) ./ 1.055, 2.4);
f = figure();
plot(e, o);
xticks([0:0.1:1])
yticks([0:0.1:1])
grid on
title('sRGB EOTF')
xlabel('electrical / non-linear value')
ylabel('optical / linear value')
axis square tight
ppi = get(0, 'ScreenPixelsPerInch');
set(f, 'PaperPosition', [0 0 400 400] ./ ppi)
print(f, 'sRGB_EOTF.png', '-dpng')
---
SPDX-FileCopyrightText: 2021 Collabora, Ltd.
SPDX-License-Identifier: MIT
---
Contents
[[_TOC_]]
[Front page](../README.md)
# A Pixel's Color
Let us assume that there is a pixel that is rendered with the intention
to be displayed. What does one need to know to be able to tell what
color the pixel has, or what human perception should that pixel ideally
invoke?
Scratching the surface of that question is a journey through a
pipeline: from interpreting bits as numbers, decoding numbers into
quantities, mapping those quantities into how the light sensitive cells
in the human eye are excited, and finally to understanding that it is
all an illusion, a perception created by the brain that depends on much
more than just the pixel.
The pixel is likely a part of an image in computer memory. You need to
know quite a lot of information, metadata, about the image to be able
to use it: width, height, stride/pitch, pixel format, tiling layout,
and maybe more. All that information allows you to access the data of
an individual pixel. Assuming you know how to do that, the following
sections dive into decoding the meaning of the pixel's data.
## Pixel Format and Quantization Range
Pixel format is the [key to interpreting][pixel-format-guide] pixel
data (arbitrary bits in memory) as color channel values. A pixel format
tells you what color channels there are, which is also a strong hint of
what the color model is. Color model is discussed in the next section.
A pixel format also tells the precision of the color channel values and
whether they are integer or floating-point.
The often found channels are listed in the following table.
| mnemonic | meaning |
|:-:|:---|
| R | red |
| G | green |
| B | blue |
| X | ignored |
| A | alpha |
| Y | luma |
| U | (first) chroma |
| V | (second) chroma |
R, G and B imply the RGB color model, and Y, U and V imply the YCbCr
color model or some variation of it.
*Quantization range* is another property of color channel values. The
concept applies mostly to integer RGB and YUV values. For example, an
8-bit unsigned integer data type can hold values between 0 and 255,
inclusive. If the nominal range of color channel values is the whole
integer range, it is called *full range*.
There is also *limited range* where only a sub-set of the integer range
is used for the nominal range of color channel values. The exact
sub-set depends on the data type and the color channel in question, and
there may be different standards. Channel values outside of the nominal
range are still allowed and sometimes useful. Some specifications also
reserve some values for other in-band signaling than actual image
content. This is closer to analog video transmission than storing
images in computer memory.
From the quantization range, unsigned integer values are usually mapped
linearly to the range [0.0, 1.0], signed integers to the range [-1.0,
1.0] such that 0.0 can be expressed exactly, and floating point is
usually taken as is, allowing arbitrary values.
[pixel-format-guide]: https://afrantzis.com/pixel-format-guide/
## Color Model
[Color models][wikipedia-color-model] are mathematical constructs where
a tuple of numbers is used to represent humanly observable colors.
Examples of color models are RGB, YCbCr, HSV, HSL, RYB, CMY, and CMYK.
Color models are not limited to three channels, but in computer
graphics and display three channels are standard.
The choice of color model depends on the use case. RGB is an additive
color model which suits driving displays at the light emitter level.
YCbCr separates brightness (luma) information from the color (chroma)
information, which allows sub-sampling the chroma information without
noticeable loss of image quality, resulting in storage space and
transmission bandwidth savings. CMYK is used in print, matching the
inks used. RYB works with paints and dyes as it is a subtractive color
model. HSV and HSL may help artists pick their colors more easily.
Interpolating colors, including drawing gradients, is highly dependent
on the choice of the color model. The color model affects the
intermediate colors in a gradient when you use a mathematical
interpolation formula between the two tuples representing the end point
colors. For example, if you use a simple linear interpolation, the
resulting gradient looks completely different whether you do the
interpolation in RGB or HSV color model. However, the color model is
not the only thing that affects how a gradient will look like.
[wikipedia-color-model]: https://en.wikipedia.org/wiki/Color_model
## Encoding
Pixel values are almost always stored and transmitted with a non-linear
encoding applied. This saves memory and bandwidth, but the non-linear
values are not suitable for operations that are meant to have a
physical meaning like blending. If you want to filter, blend or
interpolate pixels or colors, for the usual purposes that needs to
happen with linear color values.
Originally the non-linear encoding in RGB was due to the non-linear
luminance response of cathode ray tubes (CRT) versus their input
voltages. That is an inherent feature of CRT monitors using analog
video signals. When flat panel monitors and digital signals appeared,
they were made to mimic the CRT behavior, so they too have the
non-linear response (artificially) built in.
The non-linear response of CRTs was an accidental blessing. That
response has a shape roughly similar to human visual sensitivity. When
digital signals use the same non-linearity as CRTs, the number of bits
needed to encode each pixel is considerably lower than what would be
needed if the digital signal had a linear relationship to physical
light intensity. In other words, the non-linearity is a signal
compression method that reduces the needed bandwidth while keeping the
visual image quality the same.
The human visual system is more sensitive to light intensity changes in
dark than bright. If you used a linear integer encoding, you would
either use too few code points for dark shades leading to loss of
detail or use too many code points for bright shades wasting bandwidth
and memory.
### Transfer Functions
Perhaps due to the history of analog video signals, the non-linear
values are called *electrical values*, and the values that are linear
with physical light intensity are called *optical values*. A function
that describes the relationship or conversion between electrical and
optical values is called an electro-optical
[transfer function][khronos-transfer-function] (EOTF) or
opto-electronic transfer function (OETF).
An EOTF is usually associated with a display, because it describes the
conversion from electrical values into light intensity. Likewise, an
OETF is usually associated with a camera, because it describes the
conversion of light intensity into electrical values. Furthermore, in
camera-transmission-display systems there is something called an
opto-optical transfer function (OOTF). The OOTF is what you get when
you combine the OETF and the EOTF, and usually it is not the identity
mapping in order to make the picture look better to humans. Therefore
do not make the assumption that the inverse of EOTF is OETF. There are
also other OOTFs than just the end-to-end camera-transmission-display
system OOTF, so be careful there as well. An OOTF could be just a
color enhancement or luminance mapping operation.
Both OETF and the inverse of EOTF can be used for compressing linear
(optical) color channel values into non-linear (electrical) values.
Using the inverse of the compression function you can recover the
linear color channel values. When you are decoding a pixel, you need to
apply the right (inverse) function to get the linear color values.
One example of the well-known encoding functions is the sRGB EOTF. It
operates on each of the R, G and B channels independently and is
defined as
```math
R = \begin{cases}
\frac{R'}{12.92} &\text{if } R' \leq 0.04045\\
\left(\frac{R' + 0.055}{1.055}\right)^{2.4} &\text{if } R' > 0.04045\\
\end{cases}
```
and similarly for G and B. $`R' \in [0.0, 1.0]`$ is the electrical
value and $`R \in [0.0, 1.0]`$ is the optical value. This is close but
not quite a pure power-law because of the linear segment in the
function.
![A plot of the sRGB EOTF](images/sRGB_EOTF.png "sRGB EOTF")
[khronos-transfer-function]: https://www.khronos.org/registry/DataFormat/specs/1.3/dataformat.1.3.html#TRANSFER_CONVERSION
### Gamma
Gamma in the context of displays refers to the power function
```math
y = x^\gamma \quad\text{and its inverse}\quad x = y^{1/\gamma}
```
where $`x`$ and $`y`$ are the input and output, and $`\gamma > 0`$ is
the parameter. The input and output values are relative,
$`x, y \in [0, 1]`$.
This power law can approximate the CRT and human visual system
non-linearities pretty well in the standard dynamic range.
Talking about gamma as a mapping can be confusing. The values $`x`$ and
$`y`$ do not have a clear meaning without knowing the full context. If
$`\gamma > 1`$ then it is possible that $`x`$ is an electrical value
and $`y`$ is an optical value, meaning that we have an EOTF. If
$`\gamma < 1`$ then we possibly have the inverse of an EOTF. A third
possibility is that we have neither EOTF nor its inverse but a gamma
correction function which usually would be mapping electrical values to
other electrical values, or in other words, a conversion from one
compression parameter value to another.
Another problem with the term gamma is that modern EOTFs are not pure
power functions. Even the sRGB EOTF is not a pure power function but a
piece-wise function with a linear part and a power-law part. sRGB EOTF
is sometimes approximated with a pure power function, but to further
confusion the exponent in the true sRGB EOTF is different from the
$`\gamma`$ in the approximation.
Therefore it would be good to avoid using the term gamma.
## Color Space
The goal of a pixel is to provoke a specific perception of color, and
the context here is window systems and light emitting displays.
An important part of that is predicting how the human eye responds to
the light emitted according to the pixel's RGB values. Colorimetry
studies the human eye response to light. The response can be
approximated with CIE 1931 XYZ color values which are defined through a
so called *Standard Colorimetric Observer*. This observer was formed
from the average of a few people with normal color vision through a
series of tests and mathematical modeling. In other words, each XYZ
value triplet should look the same to any person with normal color
vision (as long as the surroundings and lighting in the room are kept
the same). If an RGB triplet could be converted into XYZ, one would
know more about what color it is.
The R, G and B values in a linearly encoded RGB triplet are abstract on
their own. They give ratios of red, green and blue light components.
The problem is, which red? Which green, and which blue? Different
displays can have different phosphors, LEDs or color filters. The same
RGB values could mean different XYZ values. An important part of what
an RGB triplet means is the *color space* which connects the RGB values
to the XYZ values. In other words, a color space tells us what kind of
response an RGB triplet is intended to trigger in the human eye.
Choosing the right words is difficult, and the term color space is
particularly ambiguous in casual talk. Here, the crucial part of a
color space is its connection to the trichromatic response in the human
eye (analogous to XYZ) with luminance factored away. Sometimes talking
about a color space includes also the color model and encoding. The
sRGB specification defines both the EOTF (encoding) and the human eye
response properties (color space) and it obviously uses the RGB color
model. YUV pixel data could still use sRGB color space, for instance,
which means that once you convert YUV to RGB then sRGB specification
explains how to decode it and what that color is. Sometimes the term
color space is used purely for the color model, or for encoding,
without explicitly defining the trichromatic response.
The human eye response properties of an RGB color space are defined by
its color *primaries* and *white point*. These are usually described
with CIE 1931 xy chromaticity coordinates which can be easily derived
from the XYZ coordinates. In more intuitive words, the chromaticity
describes the color without its brightness, and it does that in the
context of the human eye.
### Primaries
Primaries are the fundamental colors that "prime" a color space. Colors
in a color space are expressed as a linear combination of its
primaries. R, G and B values in an RGB color space are the weights or
intensities of the red, green and blue primary colors, and the mixtures
of the primaries produce all the colors the color space can represent.
Displays ideally work the same way. Primaries are the "pure" colors
emitted individually by the red, green and blue component light sources
in a display. Driving these component light sources with different
weights (color channel values) produces all the displayable colors. As
negative light does not physically exist, the primaries span and limit
the displayable color volume. This color volume with the luminance
dimension flattened is the *color gamut* of the display. You might want
to watch Captain Disillusion's video on
[Color][captain-disillusio-color] (7 mins) which touches this topic
while talking about human color vision.
Since RGB values are used for driving the component light sources, the
eye response for a certain RGB triplet depends on what those component
light sources are. Conversely this applies also to image content
prepared for display. The RGB values in a stored image have been
determined respective to certain primaries. If the primaries used for
the image are different from those used by a display, then something
has to happen to the RGB values to make the image look as intended on
the display.
Wide Color Gamut (WCG) displays have considerably larger color gamut
than traditional displays that have roughly the sRGB color space. This
means that ignoring luminance, WCG displays can show more colors and
they can be more saturated. The primaries of such displays are further
apart in the CIE 1931 xy chromaticity plane, covering a bigger area of
the human vision. While there may be little difference in the colors
between two traditional displays, a WCG display makes a noticeable
difference.
[captain-disillusio-color]: https://www.youtube.com/watch?v=FTKP0Y9MVus
### White point
The primaries define what color R, G and B produce individually. White
point defines the relative intensities between the primaries, telling
us what color it is when each of R, G and B has the same linear color
value. The reason we need to know the white point is that linear color
values do not have units, they are just some arbitrary quantities that
are probably different for each primary. If one is going to produce
colors by mixing the primaries, one has to know their strength relative
to each other.
White point can casually be referred to as white balance, although
white balance is usually related to cameras rather than displays or
color spaces. White point is expressed as CIE 1931 xy chromaticity
coordinates, just like primaries. Depending on the definition of the
color space, the white point may or may not have a defined maximum
absolute luminance. Luminance is discussed more in the section Dynamic
Range below.
Display white is the color of R, G and B value at their maximum on that
display specifically. Display white has the display white point
chromaticity by definition, and it also has some luminance which may or
may not be known. Display white point is a physical (or firmware)
property of the display, similar to the chromaticities of its
primaries. The white point chromaticity of a display is usually
controlled through the display's color temperature setting.
Again, the white point for image content may differ from the white
point of a display which then requires something to compensate.
In more general terms, the definition of white comes from observing a
perfect diffuse reflector under a specified illumination. Essentially
the color of white is the color of the illumination. This is not the
same as display white. Depending on various things like the environment
where the display is in and what image is being shown, what would be
perceived as white may or may not match the display white.
## Dynamic Range
Everything said above has been very vague about the dynamic range or
the available brightness range. The dynamic range has been assumed to
be both relative and unknown. Relative means that we only deal with
normalized luminance or intensity values in the range $`[0.0, 1.0]`$ or
from 0% to 100%. Unknown means that we do not know (or care) about what
absolute luminance in cd/m² that 100% value means. Usually talking
about relative luminance implies it is also unknown but reasonable for
viewing. This is all good enough for standard dynamic range (SDR).
This is not good enough for high dynamic range (HDR). While the
absolute luminance of maximum white on SDR monitors can in practice be
around 100 to 250 cd/m², HDR monitors go much higher up to 600, 1000
cd/m² or even more. You do not want to show "graphics white" at full
1000 cd/m² blast. The displayable dynamic range must be known. As a
side note, HDR displays tend to be WCG as well.
High dynamic range is not only about going for brighter and brighter
highlights with detail, it is also about going darker and with more
precision. The absolute luminance of the black level in HDR content can
be significantly darker than in SDR content. This is particularly
useful in dark room viewing environments where SDR signal would just
lose detail in the dark shades.
HDR monitors usually advertise their absolute luminance limits via EDID
or DisplayID. If a monitor was driven with a traditional relative video
signal, this would give the 0% to 100% range. Let us call this the
*passive HDR mode*. The monitor input signal range maps exactly to the
displayable monitor dynamic range and color gamut in a static way that
can be measured and modeled. This requires the signal source to adapt
the content to the monitor capabilities, but the result is predictable.
However, consumer HDR monitors are usually driven with some
standardized signal system. That makes it easy for the signal sources
as they do not need to adapt to the monitor capabilities, but then the
monitor itself will be doing the image adaptation. Let us call this the
*adaptive HDR mode*. These proprietary adaptation algorithms can be
based on HDR metadata and image content, and they are often dynamic.
That makes these monitors practically impossible to measure and model.
The result on screen is unpredictable but probably good enough for
entertainment purposes.
There are two prevalent HDR video signal systems aside from the closed
[Dolby Vision][dolby-vision] system. The two can be found in
[Recommendation ITU-R BT.2100][bt.2100].
[The PQ system][pq-system] defines the Perceptual Quantizer (PQ) EOTF
that maps pixel color values to absolute luminance in cd/m². The 100%
level of a PQ signal is 10,000 cd/m² which is practically beyond any
monitor's capabilities. When you use PQ EOTF to decode color values
into linear values, you get literal cd/m² values. These absolute
luminance values are good to display as is only on a so called
mastering display where the color control of the production has been
performed. For any other display, like any display one might have at
hand at home or in the office, some kind of *tone mapping* must be done
to adapt the content to the display capabilities. This is why PQ system
HDR signal usually comes with information about the mastering display
used.
[The HLG system][hlg-system] defines the Hybrid Log-Gamma (HLG) OETF
that is used to encode producer optical color values into electrical
values. It also defines the parameterized HLG Opto-Optical Transfer
Function (OOTF) that must be used in a display to tone map the decoded
(with the inverse OETF) video signal for the monitor at hand. The OOTF
takes care of a suitable mapping of the content to monitors with
different dynamic ranges.
[dolby-vision]: https://en.wikipedia.org/wiki/Dolby_Vision
[bt.2100]: https://www.itu.int/rec/R-REC-BT.2100
[pq-system]: https://en.wikipedia.org/wiki/Perceptual_quantizer
[hlg-system]: https://en.wikipedia.org/wiki/Hybrid_log%E2%80%93gamma
## Viewing Environment and Psycho-physical Effects
Adapting content to the monitor at hand is not enough. Also the viewing
environment affects the perception of color on screen. That can happen
directly by having surrounding light reflect from the screen, or
indirectly by changing how the human has accustomed (adapted) to seeing
things. The human eye can physically adapt to the overall light level
(going from brightly lit room into a dark room you need a moment to see
better again), but this happens quite slowly over many minutes.
A more interesting effect is psychological where the surrounding
lighting implies how physical things reflecting light are assumed to
appear. In nature there are very few things that emit light, so we are
used to seeing things that reflect light instead. Our brain
"normalizes" what the eyes see with respect to what they assume is the
illuminant (the light source, e.g. Sun or cloudy sky).
Other psychological effects exist as well. *Color appearance modeling*
studies the effect of environment and everything else that changes the
human perception of colors without actually changing the colors
themselves.
A major direct physical effect is screen flare, ambient light
reflected from the screen surface. While flare is usually fairly
uniform, colorless (white), and dim, it can have a big impact on the
perception of shades. Flare light adds to the light emitted by the
monitor. While the absolute intensity difference between shades on the
screen remains the same, the relative difference, a.k.a contrast,
between the shades diminishes as the flare gains intensity. Image
details can disappear completely when contrast falls below the
observable threshold. This can be first noticed on dark shades.
Colors themselves are relative as well, not just brightness or
contrast. A (display) color that appears white in one environment can
look bluish or reddish in another environment. This is greatly affected
by the illuminant mentioned above, as the general lighting and
surroundings provide a color reference, unless you watch your screen in
darkness.
Fortunately viewing environments tend to be arranged to be somewhat
standardish at least when color is expected to be significant. Office
environment tends to be well lit indoors, and living room environment
tends to be dim indoors with (home) theater environment as the
culmination of very dark surroundings with no stray light. Professional
color work is carried out in studios with strictly controlled viewing
environments to achieve predictable color perception. On the other
hand, outdoors is the opposite of standardish or controlled viewing
environment, but then you are usually happy to see just anything on a
screen.
In the end, it is up to the end user to arrange their surroundings and
adjust their monitor to produce the visual experience they are looking
for. As display software stack developers, our responsibility ends at
making sure the colorimetry works out right given whatever parameters
we can get.
## Conclusion
From this article it is easy to come to the conclusion that if we just
faithfully reproduce the colorimetry (how the human eye reacts to
light, or the CIE 1931 XYZ values) of images, then everything is fine.
Unfortunately that is not true, even with the viewing environment
discounted.
This "absolute colorimetric" reproduction of color is not always, or
even often, the best goal. A very practical problem is that if your
image content and display color spaces with dynamic range included are
not exactly the same, you will always have some colors in at least one
of the color spaces that have no correspondence in the other. They can
be colors the display cannot physically display, or the content does
not make use of the full capabilities of the display, or even both. For
the best visual experience the display system may need to do something
else. What to do depends on the goals of the end user. However, if you
do not understand the colorimetry then you are unlikely to succeed in
displaying anything beyond "sRGB image on an sRGB monitor" nicely.
We started with the assumption that the pixel was rendered with the
intention to be displayed. This hints that not all pixels are intended
to be displayed. Indeed, if you took a raw image from a camera
(think of professional cameras with RAW file format), it would look
quite bad on a display even if you took care of everything mentioned in
this article. Such images need to be carefully processed before they
are ready for display. Consumer cameras do that processing on the
camera using manufacturer magic algorithms, so it is possible you have
never had such an image.
All the above may seem a lot to digest, and we did not even go into
details. The point of this article is to give you an idea of the
concepts related to a pixel's color, hopefully letting you follow other
discussions around color more easily, like why one cannot "just blend"
two pixels together or how "RGB" alone does not mean much.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment