Commit 13e5856b authored by Pekka Paalanen's avatar Pekka Paalanen
Browse files

A Pixel's Color

Add a story on what one would need to know to process and display a

This is an introduction to people who are already familiar with computer
graphics in general, images in memory, and maybe window systems, but
never really thought what the values in a pixel actually mean or what
they are doing wrong with them.
Signed-off-by: Pekka Paalanen's avatarPekka Paalanen <>
parent 1ce9c034
Pipeline #447411 passed with stage
in 11 seconds
......@@ -27,6 +27,9 @@ specifications. All content must follow the [license](LICENSE).
- [Todo](doc/todo.rst) contains random notes.
- [A Pixel's Color](doc/ is an introduction to
understanding pixel values.
## History
Originally this documentation was started to better explain how
% SPDX-FileCopyrightText: 2021 Collabora, Ltd.
% SPDX-License-Identifier: MIT
s = 0.04045;
e = [0 : 0.01 : 1];
o = zeros(size(e));
mask = e <= s;
o(mask) = e(mask) ./ 12.92;
o(~mask) = realpow((e(~mask) + 0.055) ./ 1.055, 2.4);
f = figure();
plot(e, o);
grid on
title('sRGB EOTF')
xlabel('electrical / non-linear value')
ylabel('optical / linear value')
axis square tight
ppi = get(0, 'ScreenPixelsPerInch');
set(f, 'PaperPosition', [0 0 400 400] ./ ppi)
print(f, 'sRGB_EOTF.png', '-dpng')
SPDX-FileCopyrightText: 2021 Collabora, Ltd.
SPDX-License-Identifier: MIT
[Front page](../
# A Pixel's Color
Let us assume that there is a pixel that is rendered with the intention
to be displayed. What does one need to know to be able to tell what
color the pixel has, or what human perception should that pixel ideally
Scratching the surface of that question is a journey through a
pipeline: from interpreting bits as numbers, decoding numbers into
quantities, mapping those quantities into how the light sensitive cells
in the human eye are excited, and finally to understanding that it is
all an illusion, a perception created by the brain that depends on much
more than just the pixel.
The pixel is likely a part of an image in computer memory. You need to
know quite a lot of information, metadata, about the image to be able
to use it: width, height, stride/pitch, pixel format, tiling layout,
and maybe more. All that information allows you to access the data of
an individual pixel. Assuming you know how to do that, the following
sections dive into decoding the meaning of the pixel's data.
## Pixel Format
Pixel format is the [key to interpreting][pixel-format-guide] pixel
data (arbitrary bits in memory) as color channel values. A pixel format
tells you what color channels there are, which is also a strong hint of
what the color model is. Color model is discussed in the next section.
A pixel format also tells the precision of the color channel values.
The often found channels are listed in the following table.
| mnemonic | meaning |
| R | red |
| G | green |
| B | blue |
| X | ignored |
| A | alpha |
| Y | luma |
| U | (first) chroma |
| V | (second) chroma |
R, G and B imply the RGB color model, and Y, U and V imply the YCbCr
color model or some variation of it.
Pixel format tells us if the color values are stored as signed or
unsigned integers or in some floating point format. Unsigned integers
are usually mapped linearly to the range [0.0, 1.0], signed integers
roughly to the range [-1.0, 1.0] such that 0.0 can be expressed exactly,
and floating point is usually taken as is, allowing arbitrary values.
## Color Model
[Color models][wikipedia-color-model] are mathematical contructs where
a tuple of numbers is used to represent humanly observable colors.
Examples of color models are RGB, YCbCr, HSV, HSL, RYB, CMY, and CMYK.
Color models are not limited to three channels, but in computer
graphics and display three channels are standard.
The choice of color model depends on the use case. RGB is an additive
color model which suits driving displays at the light emitter level.
YCbCr separates brightness (luma) information from the color (chroma)
information, which allows sub-sampling the chroma information without
noticeable loss of image quality, resulting in storage space and
transmission bandwidth savings. CMYK is used in print, matching the
inks used. RYB works with paints and dyes as it is a subtractive color
model. HSV and HSL may help artists pick their colors more easily.
Interpolating colors, including drawing gradients, is highly dependant
on the choice of the color model. The color model affects the
intermediate colors in a gradient when you use a mathematical
interpolation formula between the two tuples representing the end point
colors. For example, if you use a simple linear interpolation, the
resulting gradient looks completely different whether you do the
interpolation in RGB or HSV color model. However, the color model is
not the only thing that affects how a gradient will look like.
## Encoding
Pixel values are almost always stored and transmitted with a non-linear
encoding applied. This saves memory and bandwidth, but the non-linear
values are not suitable for operations that are meant to have a
physical meaning like blending. If you want to filter, blend or
interpolate pixels or colors, for the usual purposes that needs to
happen with linear color values.
Originally the non-linear encoding in RGB was due to the non-linear
luminance response of cathode ray tubes (CRT) versus their input
voltages. That is an inherent feature of CRT monitors using analog
video signals. When flat planel monitors and digital signals appeared,
they were made to mimick the CRT behavior, so they too have the
non-linear response (artifically) built in.
The non-linear response of CRTs was an accidental blessing. That
response has a shape roughly similar to human visual sensitivity. When
digital signals use the same non-linearity as CRTs, the number of bits
needed to encode each pixel is considerably lower than what would be
needed if the digital signal had a linear relationship to physical
light intensity. In other words, the non-linearity is a signal
compression method that reduces the needed bandwidth while keeping the
visual image quality the same.
The human visual system is more sensitive to light intensity changes in
dark than bright. If you used a linear integer encoding, you would
either use too few code points for dark shades leading to loss of
detail or use too many code points for bright shades wasting bandwidth
and memory.
### Transfer functions
Perhaps due to the history of analog video signals, the non-linear
values are called *electrical values*, and the values that are linear
with physical light intensity are called *optical values*. A function
that describes the relationship or conversion between electrical and
optical values is called an electro-optical
[transfer function][khronos-transfer-function] (EOTF) or
opto-electronic transfer function (OETF).
An EOTF is usually associated with a display, because it describes the
conversion from electrical values into light intensity. Likewise, an
OETF is usually associated with a camera, because it describes the
conversion of light intensity into electrical values. Furthermore, in
camera-transmission-display systems there is something called an
opto-optical transfer function (OOTF). The OOTF is what you get when
you combine the OETF and the EOTF, and usually it is not the identity
mapping in order to make the picture look better to humans. Therefore
do not make the assumption that the inverse of EOTF is OETF.
Both OETF and the inverse of EOTF can be used for compressing linear
(optical) color channel values into non-linear (electrical) values.
Using the inverse of the compression function you can recover the
linear color channel values. When you are decoding a pixel, you need to
apply the right (inverse) function to get the linear color values.
One example of the well-known encoding functions is the sRGB EOTF. It
operates on each of the R, G and B channels independently and is
defined as
R = \begin{cases}
\frac{R'}{12.92} &\text{if } R' \leq 0.04045\\
\left(\frac{R' + 0.055}{1.055}\right)^{2.4} &\text{if } R' > 0.04045\\
and similarly for G and B. $`R' \in [0.0, 1.0]`$ is the electrical
value and $`R \in [0.0, 1.0]`$ is the optical value. This is close but
not quite a pure power-law because of the linear segment in the
![A plot of the sRGB EOTF](images/sRGB_EOTF.png "sRGB EOTF")
### Gamma
Gamma in the context of displays refers to the power function
y = x^\gamma \quad\text{and its inverse}\quad x = y^{1/\gamma}
where $`x`$ and $`y`$ are the input and output, and $`\gamma > 0`$ is
the parameter. The input and output values are relative,
$`x, y \in [0, 1]`$.
This power law can approximate the CRT and human visual system
non-linearities pretty well in the standard dynamic range.
Talking about gamma as a mapping can be confusing. The values $`x`$ and
$`y`$ do not have a clear meaning without knowing the full context. If
$`\gamma > 1`$ then it is possible that $`x`$ is an electrical value
and $`y`$ is an optical value, meaning that we have an EOTF. If
$`\gamma < 1`$ then we possibly have the inverse of an EOTF. A third
possibility is that we have neither EOTF nor its inverse but a gamma
correction function which usually would be mapping electrical values to
other electrical values, or in other words, a conversion from one
compression parameter value to another.
Another problem with the term gamma is that modern EOTFs are not pure
power functions. Even the sRGB EOTF is not a pure power function but a
piece-wise function with a linear part and a power-law part. sRGB EOTF
is sometimes approximated with a pure power function, but to further
confusion the exponent in the true sRGB EOTF is different from the
$`\gamma`$ in the approximation.
Therefore it would be good to avoid using the term gamma.
## Color Space
The goal of a pixel is to provoke a specific perception of color, and
the context here is window systems and light emitting displays.
An important part of that is predicting how the human eye responds to
the light emitted according to the pixel's RGB values. Colorimetry
studies the human eye response to light. The response can be
approximated with CIE 1931 XYZ color values which are defined through a
so called *Standard Colorimetric Observer*. This observer was formed
from the average of a few people with normal color vision through a
series of tests and mathematical modeling. In other words, each XYZ
value triplet should look the same to any person with normal color
vision (as long as the surroundings and lighting in the room are kept
the same). If an RGB triplet could be converted into XYZ, one would
know more about what color it is.
The R, G and B values in a linearly encoded RGB triplet are abstract on
their own. They give ratios of red, green and blue light components.
The problem is, which red? Which green, and which blue? Different
displays can have different phosphors, LEDs or color filters. The same
RGB values could mean different XYZ values. An important part of what
an RGB triplet means is the *color space* which connects the RGB values
to the XYZ values. In other words, a color space tells us what kind of
response an RGB triplet is intended to trigger in the human eye.
Choosing the right words is difficult, and the term color space is
particularly ambiguous in casual talk. Here, the crucial part of a
color space is its connection to the trichromatic response in the human
eye (analogous to XYZ) with luminance factored away. Sometimes talking
about a color space includes also the color model and encoding. The
sRGB specification defines both the EOTF (encoding) and the human eye
response properties (color space) and it obviously uses the RGB color
model. YUV pixel data could still use sRGB color space, for instance,
which means that once you convert YUV to RGB then sRGB specification
explains how to decode it and what that color is. Sometimes the term
color space is used purely for the color model, or for encoding,
without explicitly defining the trichromatic response.
The human eye response properties of an RGB color space are defined by
its color *primaries* and *white point*. These are usually described
with CIE 1931 xy chromaticity coordinates which can be easily derived
from the XYZ coordinates. In more intuitive words, the chromaticity
describes the color without its brightness, and it does that in the
context of the human eye.
### Primaries
Display primaries are the "pure" colors emitted individually by the
red, green and blue component light sources in a display. Driving these
component light sources with different weights (color channel values)
produces all the displayable colors. As negative light does not
physically exist, the primaries span and limit the displayable color
volume. This color volume with the luminance dimension flattened is the
*color gamut* of the display. You might want to watch Captain
Disillusion's video on [Color][captain-disillusio-color] (7 mins) which
touches this topic while talking about human color vision.
Since RGB values are used for driving the component light sources, the
eye response for a certain RGB triplet depends on what those component
light sources are. Looking at each component light source individually
in total isolation and darkness, the observed chromaticity should
remain constant over the component's luminance range. Displays are
manufactured to achieve this, and so a single chromaticity coordinate
pair can be used to describe each of the three primaries in a display.
Conversely this applies also to image content prepared for display. The
RGB values in a stored image have been determined respective to certain
primaries. If the primaries used for the image are different from those
used by a display, then something has to happen to the RGB values to
make the image look as intended on the display.
Wide Color Gamut (WCG) displays have considerably larger color gamut
than traditional displays that have roughly the sRGB color space. This
means that ignoring luminance, WCG displays can show more colors and
they can be more saturated. The primaries of such displays are further
apart in the CIE 1931 xy chromaticity plane, covering a bigger area of
the human vision. While there may be little difference in the colors
between two traditional displays, a WCG display makes a noticeable
### White point
White point describes the white balance of a display and is usually
controlled through a monitor's color temperature setting. Expressed as
chromaticity coordinates x and y, white point is a more general concept
than the one-dimensional color temperature.
The chromaticity coordinates of a display white are called the
(display) white point. Display white is the color with R, G and B value
at their maximum. Furthermore, displays are manufactured and/or
calibrated such that any neutral color, $`R=G=B`$, has the same white
point chromaticity. Display white point is a physical (or firmware)
property of the display, similar to the chromaticities of its
Again, the white point for image content may differ from the white
point of a display which then requires color adjustment to compensate.
In more general terms, the definition of white comes from observing a
perfect diffuse reflector under a specified illumination. Essentially
the color of white is the color of the illumination. This is not the
same as display white. Depending on various things like the environment
where the display is in and what image is being shown, what would be
perceived as white may or may not match the display white.
## Dynamic Range
Everything said above has been very vague about the dynamic range or
the available brightness range. The dynamic range has been assumed to
be both relative and unknown. Relative means that we only deal with
normalized luminance or intensity values in the range $`[0.0, 1.0]`$ or
from 0% to 100%. Unknown means that we do not know (or care) about what
absolute luminance in cd/m² that 100% value means. Usually talking
about relative luminance implies it is also unknown but reasonable for
viewing. This is all good enough for standard dynamic range (SDR).
This is not good enough for high dynamic range (HDR). While the
absolute luminance of maximum white on SDR monitors can in practice be
around 100 to 250 cd/m², HDR monitors go much higher up to 600, 1000
cd/m² or even more. You do not want to show "graphics white" at full
1000 cd/m² blast. The displayable dynamic range must be known. As a
side note, HDR displays tend to be WCG as well.
High dynamic range is not only about going for brighter and brighter
highlights with detail, it is also about going darker and with more
precision. The absolute luminance of the black level in HDR content can
be significantly darker than in SDR content. This is particularly
useful in dark room viewing environments where SDR signal would just
lose detail in the dark shades.
HDR monitors usually advertise their absolute luminance limits via EDID
or DisplayID. If a monitor was driven with a traditional relative video
signal, this would give the 0% to 100% range. Let us call this the
*passive HDR mode*. The monitor input signal range maps exactly to the
displayable monitor dynamic range and color gamut in a static way that
can be measured and modelled. This requires the signal source to adapt
the content to the monitor capabilities, but the result is predictable.
However, consumer HDR monitors are usually driven with some
standardised signal system. That makes it easy for the signal sources
as they do not need to adapt to the monitor capabilities, but then the
monitor itself will be doing the image adaptation. Let us call this the
*adaptive HDR mode*. These proprietary adaptation algorithms can be
based on HDR metadata and image content, and they are often dynamic.
That makes these monitors practically impossible to measure and model.
The result on screen is unpredictable but probably good enough for
entertainment purposes.
There are two prevalent HDR video signal systems aside from the closed
Dolby system.
The PQ system defines the Perceptual Quantizer (PQ) EOTF that maps
pixel color values to absolute luminance in cd/m². The 100% level of a
PQ signal is 10,000 cd/m² which is practically beyond any monitor's
capabilities. When you use PQ EOTF to decode color values into linear
values, you get literal cd/m² values. These absolute luminance values
are good to display as is only on a so called mastering display where
the color control of the production has been performed. For any other
display, like any display one might have at hand at home or in the
office, some kind of *tone mapping* must be done to adapt the content to
the display capabilities. This is why PQ system HDR signal usually
comes with information about the mastering display used.
The HLG system defines the HLG OETF that is used to encode producer
optical color values into electrical values. It also defines the
parameterised HLG Opto-Optical Transfer Function (OOTF) that must be
used in a display to tone map the decoded (with the inverse OETF) video
signal for the monitor at hand. The OOTF takes care of a suitable
mapping of the content to monitors with different dynamic ranges.
## Viewing Environment
Adapting content to the monitor at hand is not enough. The viewing
environment also affects the perception of color on screen. Color
appearance modeling studies the effect of environment and everything
else that changes the human perception of colors without actually
changing the colors themselves.
One of the major physical effects is screen flare, ambient light
reflected from the screen surface. While the flare is usually fairly
uniform, colorless (white), and dim (otherwise it bothers the viewer
anyway), it can still have a big impact on the perception of dark
colors. This is because the flare can easily dominate the dark colors
emitted by a display and the human visual system adapts to the overall
brightness of what it sees, losing visual details as the dark colors
are no longer distinguishable under the elevated overall brightness.
The overall illumination of the viewing environment affects the
adaptation state of the human visual system. It affects both the
dynamic range (brightness) adaptation and the white point adaptation.
White point adaptation can be observed from different perceived colors
of white. A (display) color that looks white in one environment can
look bluish or reddish in another environment. Usually this is
compensated with the monitor color temperature settings.
If the environment is very dark and the screen covers much of the field
of vision, then the human visual system adapts mostly to the screen
content as there is little other color reference in sight.
Fortunately viewing environments tend to be arranged to be somewhat
standardish for most use cases. Office environment tends to be well lit
indoors, and living room environment tends to be dim indoors with
(home) theatre environment as the culmination of very dark surroundings
with no stray light. Professional color work uses strictly controlled
viewing environments. This avoids environment related problems.
## Conclusion
We started with the assumption that the pixel was rendered with the
intention to be displayed. This hints that not all pixels are intended
to be displayed. In fact, if you took a raw image from a camera
(think of professional cameras with RAW file format), it would look
quite bad on a display even if you took care of everything mentioned in
this article. Such images need to be carefully processed before they
are ready for display. Consumer cameras do that processing on the
camera using manufacturer magic algorithms, so it is possible you have
never had such an image.
All the above may seem a lot to digest, and we did not even go into
details. The point of this article is to give you an idea of the
concepts related to a pixel's color, hopefully letting you follow other
discussions around color more easily, like why one cannot "just blend"
two pixels together or how "RGB" alone does not mean much.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment