Commit a86785df authored by Pekka Paalanen's avatar Pekka Paalanen
Browse files

A Pixel's Color

Add a story on what one would need to know to process and display a

This is an introduction to people who are already familiar with computer
graphics in general, images in memory, and maybe window systems, but
never really thought what the values in a pixel actually mean or what
they are doing wrong with them.
Signed-off-by: Pekka Paalanen's avatarPekka Paalanen <>
parent 11ca7ce2
Pipeline #442929 passed with stage
in 9 seconds
......@@ -27,6 +27,9 @@ specifications. All content must follow the [license](LICENSE).
- [Todo](doc/todo.rst) contains random notes.
- [A Pixel's Color](doc/ is an introduction to
understanding pixel values.
## History
Originally this documentation was started to better explain how
SPDX-FileCopyrightText: 2021 Collabora, Ltd.
SPDX-License-Identifier: MIT
[Front page](../
# A Pixel's Color
Let us assume that there is a pixel that is rendered with the intention
to be displayed. What does one need to know to be able to tell what
color the pixel has, or what human perception should that pixel ideally
Scratching the surface of that question is a journey through a
pipeline: from interpreting bits as numbers, decoding numbers into
quantities, mapping those quantities into how the light sensitive cells
in the human eye are excited, and finally to understanding that it is
all an illusion, a perception created by the brain that depends on much
more than just the pixel.
The pixel is likely a part of an image in computer memory. You need to
know quite a lot of information, metadata, about the image to be able
to use it: width, height, stride/pitch, pixel format, tiling layout,
and maybe more. All that information allows you to access the data of
an individual pixel. Assuming you know how to do that, the following
sections dive into decoding the meaning of the pixel's data.
## Pixel Format
Pixel format is the [key to interpreting][pixel-format-guide] pixel
data (arbitrary bits in memory) as color channel values. A pixel format
tells you what color channels there are, which is also a strong hint of
what the color model is. Color model is discussed in the next section.
A pixel format also tells the precision of the color channel values.
The often found channels are listed in the following table.
| mnemonic | meaning |
| R | red |
| G | green |
| B | blue |
| X | ignored |
| A | alpha |
| Y | luma |
| U | (first) chroma |
| V | (second) chroma |
R, G and B imply the RGB color model, and Y, U and V imply the YCbCr
color model or some variation of it.
Pixel format tells us if the color values are stored as signed or
unsigned integers or in some floating point format. Unsigned integers
are usually mapped linearly to the range [0.0, 1.0], signed integers
roughly to the range [-1.0, 1.0] such that 0.0 can be expressed exactly,
and floating point is usually taken as is, allowing arbitrary values.
## Color Model
[Color models][wikipedia-color-model] are mathematical contructs where
a tuple of numbers is used to represent humanly observable colors.
Examples of color models are RGB, YCbCr, HSV, HSL, RYB, CMY, and CMYK.
Color models are not limited to three channels, but in computer
graphics and display three channels are standard.
The choice of color model depends on the use case. RGB is an additive
color model which suits driving displays at the light emitter level.
YCbCr separates brightness (luma) information from the color (chroma)
information, which allows sub-sampling the chroma information without
noticeable loss of image quality, resulting in storage space and
transmission bandwidth savings. CMYK is used in print, matching the
inks used. RYB works with paints and dyes as it is a subtractive color
model. HSV and HSL may help artists pick their colors more easily.
Interpolating colors, including drawing gradients, is highly dependant
on the choice of the color model. The color model affects the
intermediate colors in a gradient when you use a mathematical
interpolation formula between the two tuples representing the end point
colors. For example, if you use a simple linear interpolation, the
resulting gradient looks completely different whether you do the
interpolation in RGB or HSV color model. However, the color model is
not the only thing that affects how a gradient will look like.
## Encoding
Pixel values are almost always stored and transmitted with a non-linear
encoding applied. This saves memory and bandwidth, but the non-linear
values are not suitable for operations that are meant to have a
physical meaning like blending. If you want to filter, blend or
interpolate pixels or colors, for the usual purposes that needs to
happen with linear color values.
Originally the non-linear encoding in RGB was due to the non-linear
luminance response of cathode ray tubes (CRT) versus their input
voltages. That is an inherent feature of CRT monitors using analog
video signals. When flat planel monitors and digital signals appeared,
they were made to mimick the CRT behavior, so they too have the
non-linear response (artifically) built in.
The non-linear response of CRTs was an accidental blessing. That
response has a shape roughly similar to human visual sensitivity. When
digital signals use the same non-linearity as CRTs, the number of bits
needed to encode each pixel is considerably lower than what would be
needed if the digital signal had a linear relationship to physical
light intensity. In other words, the non-linearity is a signal
compression method that reduces the needed bandwidth while keeping the
visual image quality the same.
The human visual system is more sensitive to light intensity changes in
dark than bright. If you used a linear integer encoding, you would
either use too few code points for dark shades leading to loss of
detail or use too many code points for bright shades wasting bandwidth
and memory.
### Transfer functions
Perhaps due to the history of analog video signals, the non-linear
values are called *electrical values*, and the values that are linear
with physical light intensity are called *optical values*. A function
that describes the relationship or conversion between electrical and
optical values is called an electro-optical transfer function (EOTF) or
opto-electronic transfer function (OETF).
An EOTF is usually associated with a display, because it describes the
conversion from electrical values into light intensity. Likewise, an
OETF is usually associated with a camera, because it describes the
conversion of light intensity into electrical values. Furthermore, in
camera-transmission-display systems there is something called an
opto-optical transfer function (OOTF). The OOTF is what you get when
you combine the OETF and the EOTF, and usually it is not the identity
mapping in order to make the picture look better to humans. Therefore
do not make the assumption that the inverse of EOTF is OETF.
Both OETF and the inverse of EOTF can be used for compressing linear
(optical) color channel values into non-linear (electrical) values.
Using the inverse of the compression function you can recover the
linear color channel values. When you are decoding a pixel, you need to
apply the right (inverse) function to get the linear color values.
The sRGB EOTF is a well known encoding function.
### Gamma
Gamma in the context for displays refers to the power function
y = x^\gamma \quad\text{and its inverse}\quad x = y^{1/\gamma}
where $`x`$ and $`y`$ are the input and output, and $`\gamma > 0`$ is
the parameter. The input and output values are relative,
$`x, y \in [0, 1]`$.
This power law can approximate the CRT and human visual system
non-linearities pretty well in the standard dynamic range.
Talking about gamma as a mapping can be confusing. The values $`x`$ and
$`y`$ do not have a clear meaning without knowing the full context. If
$`\gamma > 1`$ then it is possible that $`x`$ is an electrical value
and $`y`$ is an optical value, meaning that we have an EOTF. If
$`\gamma < 1`$ then we possibly have the inverse of an EOTF. A third
possibility is that we have neither EOTF nor its inverse but a gamma
correction function which usually would be mapping electrical values to
other electrical values, or in other words, a conversion from one
compression parameter value to another.
Another problem with the term gamma is that modern EOTFs are not pure
power functions. Even the sRGB EOTF is not a pure power function but a
piece-wise function with a linear part and a power-law part. sRGB EOTF
is sometimes approximated with a pure power function, but to further
confusion the exponent in the true sRGB EOTF is different from the
$`\gamma`$ in the approximation.
Therefore it would be good to avoid using the term gamma.
## Color Space
As the context here is mostly window systems and displays, RGB is the
color model of choice. Knowing the pixel format, color model, and the
encoding allows one to convert anything you have into an RGB triplet
with linear values which are directly related to emitted light
intensities in displays. As such, the linear values already allow some
digital image processing in physical terms, for example texture
The goal of a pixel is to provoke a specific perception of color, and
an important part of that is predicting how the human eye responds to
the light emitted according to the pixel's RGB values. Colorimetry
studies the human eye response to light. The response can be
approximated with CIE 1931 XYZ color values which are defined through a
so called *Standard Colorimetric Observer*. This observer was formed
from the average of a few people with normal color vision through a
series of tests and mathematical modeling. In other words, each XYZ
value triplet should look the same to any person with normal color
vision (as long as the surroundings and lighting in the room are kept
the same). If an RGB triplet could be converted into XYZ, one would
know more about what color it is.
The R, G and B values in a linearly encoded RGB triplet are abstract on
their own. They give ratios of red, green and blue light components.
The problem is, which red? Which green, and which blue? Different
displays can have different phosphors, LEDs or color filters. The same
RGB values could mean different XYZ values. An important part of what
an RGB triplet means is the *color space* which connects the RGB values
to the XYZ values. In other words, a color space tells us what kind of
response an RGB triplet is intended to trigger in the human eye.
Choosing the right words is difficult, and the term color space is
particularly ambiguous in casual talk. Here, the crucial part of a
color space is its connection to the trichromatic response in the human
eye (analogous to XYZ) with luminance factored away. Sometimes talking
about a color space includes also the color model and encoding. The
sRGB specification defines both the EOTF (encoding) and the human eye
response properties and it obviously uses the RGB color model. YUV
pixel data could still use sRGB color space, for instance, which means
that once you convert YUV to RGB then sRGB specification explains what
that color is. Sometimes the term color space is used purely for the
color model, or for encoding, without explicitly defining the
trichromatic response.
The human eye response properties of an RGB color space are defined by
its color *primaries* and *white point*. These are usually described
with CIE 1931 xy chromaticity coordinates which can be easily derived
from the XYZ coordinates. In more intuitive words, the chromaticity
describes the color without its brightness, and it does that in the
context of the human eye.
### Primaries
Display primaries are the "pure" colors emitted individually by the
red, green and blue component light sources in a display. Driving these
component light sources with different weights (color channel values)
produces all the displayable colors. As negative light does not
physically exist, the primaries span and limit the displayable color
volume. This color volume with the luminance dimension flattened is the
*color gamut* of the display.
Since RGB values are used for driving the component light sources, the
eye response for a certain RGB triplet depends on what those component
light sources are. Displays are manufactured such that when one
component's intensity varies and the others stay at zero, the observed
chromaticity coordinates do not change. Hence, a single chromaticity
coordinate pair can be used to describe each of the three primaries.
Conversely this applies also to image content prepared for display. The
RGB values in a stored image have been determined respective to certain
primaries. If the primaries used for the image are different from those
used by a display, then something has to happen to the RGB values to
make the image look as intended on the display.
### White point
White point describes the white balance of a display and is usually
controlled through a monitor's color temperature setting. Expressed as
chromaticity coordinates, white point does not need to be on the
Planckian locus like color temperature is.
The chromaticity coordinates of a display white are called the
(display) white point. Display white is the color with R, G and B value
at their maximum. Furthermore, displays are manufactured and/or
calibrated such that any neutral color, $`R=G=B`$, has the same white
point chromaticity. Display white point is a physical (or firmware)
property of the display, similar to the chromaticities of its
Again, the white point for image content may differ from the white
point of a display which then requires color adjustment to compensate.
In more general terms, the definition of white comes from observing a
perfect diffuse reflector under a specified illumination. Essentially
the color of white is the color of the illumination. This is not the
same as display white. Depending on various things like the environment
where the display is in and what image is being shown, what would be
perceived as white may or may not match the display white.
## Dynamic Range
Everything said above has been very vague about the dynamic range or
the available brightness range. The dynamic range has been assumed to
be both relative and unknown. Relative means that we only deal with
normalized luminance or intensity values in the range $`[0.0, 1.0]`$ or
from 0% to 100%. Unknown means that we do not know (or care) about what
absolute luminance in cd/m² that 100% value means. Usually talking
about relative luminance also implies it is unknown but reasonable for
viewing. This is all good enough for standard dynamic range (SDR).
This is not good enough for high dynamic range (HDR). While the
absolute luminance of maximum white on SDR monitors can in practice be
around 100 to 250 cd/m², HDR monitors go much higher up to 600, 1000
cd/m² or even more. You do not want to show "graphics white" at full
1000 cd/m² blast. The displayable dynamic range must be known.
High dynamic range is not only about going for brighter and brighter
highlights with detail, it is also about going darker and with more
precision. The absolute luminance of the black level in HDR content can
be significantly darker than in SDR content. This is particularly
useful in dark room viewing environments where SDR signal would just
lose detail in the dark shades.
HDR monitors usually advertise their absolute luminance limits via EDID
or DisplayID. If a monitor was driven with a traditional relative video
signal, this would give the 0% to 100% range. However, HDR monitors are
usually driven with some standardised signal system, and there are two
prevalent systems aside from the closed Dolby system.
The PQ system defines the Perceptual Quantizer (PQ) EOTF that maps
pixel color values to absolute luminance in cd/m². The 100% level of a
PQ signal is 10,000 cd/m² which is practically beyond any monitor's
capabilities. When you use PQ EOTF to decode color values into linear
values, you get literal cd/m² values. These absolute luminance values
are good to display as is only on a so called mastering display where
the color control of the production has been performed. For any other
display, like any display one might have at hand at home or in the
office, some kind of tone mapping must be done to adapt the content to
the display capabilities. This is why PQ system HDR signal usually
comes with information about the mastering display used.
The HLG system defines the HLG OETF that is used to encode producer
optical color values into electrical values. It also defines the
parameterised HLG Opto-Optical Transfer Function (OOTF) that must be
used in a display to tone map the decoded (with the inverse OETF) video
signal for the monitor at hand. The OOTF takes care of a suitable
mapping of the content to monitors with different dynamic ranges.
## Viewing Environment
Adapting content to the monitor at hand is not enough. The viewing
environment also affects the perception of color on screen. Color
appearance modeling studies the effect of environment and everything
else that changes the human perception of colors without actually
changing the colors themselves.
One of the major physical effects is screen flare, ambient light
reflected from the screen surface. While the flare is usually fairly
uniform, colorless (white), and dim (otherwise it bothers the viewer
anyway), it can still have a big impact on the perception of dark
colors. This is because the flare can easily dominate the dark colors
emitted by a display and the human visual system adapts to the overall
brightness of what it sees, losing visual details as the dark colors
are no longer distinguishable under the elevated overall brightness.
The overall illumination of the viewing environment affects the
adaptation state of the human visual system. It affects both the
dynamic range (brightness) adaptation and the white point adaptation.
White point adaptation can be observed from different perceived colors
of white. A (display) color that looks white in one environment can
look bluish or reddish in another environment. Usually this is
compensated with the monitor color temperature settings.
If the environment is very dark and the screen covers much of the field
of vision, then the human visual system adapts mostly to the screen
content as there is little other color reference in sight.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment