pixels_color.md 20.8 KB
Newer Older
Pekka Paalanen's avatar
Pekka Paalanen committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
---
SPDX-FileCopyrightText: 2021 Collabora, Ltd.
SPDX-License-Identifier: MIT
---

Contents

[[_TOC_]]

[Front page](../README.md)


# A Pixel's Color

Let us assume that there is a pixel that is rendered with the intention
to be displayed. What does one need to know to be able to tell what
color the pixel has, or what human perception should that pixel ideally
invoke?

Scratching the surface of that question is a journey through a
pipeline: from interpreting bits as numbers, decoding numbers into
quantities, mapping those quantities into how the light sensitive cells
in the human eye are excited, and finally to understanding that it is
all an illusion, a perception created by the brain that depends on much
more than just the pixel.

The pixel is likely a part of an image in computer memory. You need to
know quite a lot of information, metadata, about the image to be able
to use it: width, height, stride/pitch, pixel format, tiling layout,
and maybe more. All that information allows you to access the data of
an individual pixel. Assuming you know how to do that, the following
sections dive into decoding the meaning of the pixel's data.


## Pixel Format

Pixel format is the [key to interpreting][pixel-format-guide] pixel
data (arbitrary bits in memory) as color channel values. A pixel format
tells you what color channels there are, which is also a strong hint of
what the color model is. Color model is discussed in the next section.
A pixel format also tells the precision of the color channel values.

The often found channels are listed in the following table.

| mnemonic | meaning |
|:-:|:---|
| R | red |
| G | green |
| B | blue |
| X | ignored |
| A | alpha |
| Y | luma |
| U | (first) chroma |
| V | (second) chroma |

R, G and B imply the RGB color model, and Y, U and V imply the YCbCr
color model or some variation of it.

Pixel format tells us if the color values are stored as signed or
unsigned integers or in some floating point format. Unsigned integers
are usually mapped linearly to the range [0.0, 1.0], signed integers
roughly to the range [-1.0, 1.0] such that 0.0 can be expressed exactly,
and floating point is usually taken as is, allowing arbitrary values.

[pixel-format-guide]: https://afrantzis.com/pixel-format-guide/


## Color Model

[Color models][wikipedia-color-model] are mathematical contructs where
a tuple of numbers is used to represent humanly observable colors.
Examples of color models are RGB, YCbCr, HSV, HSL, RYB, CMY, and CMYK.
Color models are not limited to three channels, but in computer
graphics and display three channels are standard.

The choice of color model depends on the use case. RGB is an additive
color model which suits driving displays at the light emitter level.
YCbCr separates brightness (luma) information from the color (chroma)
information, which allows sub-sampling the chroma information without
noticeable loss of image quality, resulting in storage space and
transmission bandwidth savings. CMYK is used in print, matching the
inks used. RYB works with paints and dyes as it is a subtractive color
model. HSV and HSL may help artists pick their colors more easily.

Interpolating colors, including drawing gradients, is highly dependant
on the choice of the color model. The color model affects the
intermediate colors in a gradient when you use a mathematical
interpolation formula between the two tuples representing the end point
colors. For example, if you use a simple linear interpolation, the
resulting gradient looks completely different whether you do the
interpolation in RGB or HSV color model. However, the color model is
not the only thing that affects how a gradient will look like.

[wikipedia-color-model]: https://en.wikipedia.org/wiki/Color_model


## Encoding

Pixel values are almost always stored and transmitted with a non-linear
encoding applied. This saves memory and bandwidth, but the non-linear
values are not suitable for operations that are meant to have a
physical meaning like blending. If you want to filter, blend or
interpolate pixels or colors, for the usual purposes that needs to
happen with linear color values.

Originally the non-linear encoding in RGB was due to the non-linear
luminance response of cathode ray tubes (CRT) versus their input
voltages. That is an inherent feature of CRT monitors using analog
video signals. When flat planel monitors and digital signals appeared,
they were made to mimick the CRT behavior, so they too have the
non-linear response (artifically) built in.

The non-linear response of CRTs was an accidental blessing. That
response has a shape roughly similar to human visual sensitivity. When
digital signals use the same non-linearity as CRTs, the number of bits
needed to encode each pixel is considerably lower than what would be
needed if the digital signal had a linear relationship to physical
light intensity. In other words, the non-linearity is a signal
compression method that reduces the needed bandwidth while keeping the
visual image quality the same.

The human visual system is more sensitive to light intensity changes in
dark than bright. If you used a linear integer encoding, you would
either use too few code points for dark shades leading to loss of
detail or use too many code points for bright shades wasting bandwidth
and memory.

### Transfer functions

Perhaps due to the history of analog video signals, the non-linear
values are called *electrical values*, and the values that are linear
with physical light intensity are called *optical values*. A function
that describes the relationship or conversion between electrical and
optical values is called an electro-optical
[transfer function][khronos-transfer-function] (EOTF) or
opto-electronic transfer function (OETF).

An EOTF is usually associated with a display, because it describes the
conversion from electrical values into light intensity. Likewise, an
OETF is usually associated with a camera, because it describes the
conversion of light intensity into electrical values. Furthermore, in
camera-transmission-display systems there is something called an
opto-optical transfer function (OOTF). The OOTF is what you get when
you combine the OETF and the EOTF, and usually it is not the identity
mapping in order to make the picture look better to humans. Therefore
do not make the assumption that the inverse of EOTF is OETF.

Both OETF and the inverse of EOTF can be used for compressing linear
(optical) color channel values into non-linear (electrical) values.
Using the inverse of the compression function you can recover the
linear color channel values. When you are decoding a pixel, you need to
apply the right (inverse) function to get the linear color values.

One example of the well-known encoding functions is the sRGB EOTF. It
operates on each of the R, G and B channels independently and is
defined as

```math
R = \begin{cases}
	\frac{R'}{12.92} &\text{if } R' \leq 0.04045\\
	\left(\frac{R' + 0.055}{1.055}\right)^{2.4} &\text{if } R' > 0.04045\\
\end{cases}
```

and similarly for G and B. $`R' \in [0.0, 1.0]`$ is the electrical
value and $`R \in [0.0, 1.0]`$ is the optical value. This is close but
not quite a pure power-law because of the linear segment in the
function.

![A plot of the sRGB EOTF](images/sRGB_EOTF.png "sRGB EOTF")

[khronos-transfer-function]: https://www.khronos.org/registry/DataFormat/specs/1.3/dataformat.1.3.html#TRANSFER_CONVERSION

### Gamma

Gamma in the context of displays refers to the power function

```math
y = x^\gamma \quad\text{and its inverse}\quad x = y^{1/\gamma}
```

where $`x`$ and $`y`$ are the input and output, and $`\gamma > 0`$ is
the parameter. The input and output values are relative,
$`x, y \in [0, 1]`$.

This power law can approximate the CRT and human visual system
non-linearities pretty well in the standard dynamic range.

Talking about gamma as a mapping can be confusing. The values $`x`$ and
$`y`$ do not have a clear meaning without knowing the full context. If
$`\gamma > 1`$ then it is possible that $`x`$ is an electrical value
and $`y`$ is an optical value, meaning that we have an EOTF. If
$`\gamma < 1`$ then we possibly have the inverse of an EOTF. A third
possibility is that we have neither EOTF nor its inverse but a gamma
correction function which usually would be mapping electrical values to
other electrical values, or in other words, a conversion from one
compression parameter value to another.

Another problem with the term gamma is that modern EOTFs are not pure
power functions. Even the sRGB EOTF is not a pure power function but a
piece-wise function with a linear part and a power-law part. sRGB EOTF
is sometimes approximated with a pure power function, but to further
confusion the exponent in the true sRGB EOTF is different from the
$`\gamma`$ in the approximation.

Therefore it would be good to avoid using the term gamma.


## Color Space

The goal of a pixel is to provoke a specific perception of color, and
the context here is window systems and light emitting displays.
An important part of that is predicting how the human eye responds to
the light emitted according to the pixel's RGB values. Colorimetry
studies the human eye response to light. The response can be
approximated with CIE 1931 XYZ color values which are defined through a
so called *Standard Colorimetric Observer*. This observer was formed
from the average of a few people with normal color vision through a
series of tests and mathematical modeling. In other words, each XYZ
value triplet should look the same to any person with normal color
vision (as long as the surroundings and lighting in the room are kept
the same). If an RGB triplet could be converted into XYZ, one would
know more about what color it is.

The R, G and B values in a linearly encoded RGB triplet are abstract on
their own. They give ratios of red, green and blue light components.
The problem is, which red? Which green, and which blue? Different
displays can have different phosphors, LEDs or color filters. The same
RGB values could mean different XYZ values. An important part of what
an RGB triplet means is the *color space* which connects the RGB values
to the XYZ values. In other words, a color space tells us what kind of
response an RGB triplet is intended to trigger in the human eye.

Choosing the right words is difficult, and the term color space is
particularly ambiguous in casual talk. Here, the crucial part of a
color space is its connection to the trichromatic response in the human
eye (analogous to XYZ) with luminance factored away. Sometimes talking
about a color space includes also the color model and encoding. The
sRGB specification defines both the EOTF (encoding) and the human eye
response properties (color space) and it obviously uses the RGB color
model. YUV pixel data could still use sRGB color space, for instance,
which means that once you convert YUV to RGB then sRGB specification
explains how to decode it and what that color is. Sometimes the term
color space is used purely for the color model, or for encoding,
without explicitly defining the trichromatic response.

The human eye response properties of an RGB color space are defined by
its color *primaries* and *white point*. These are usually described
with CIE 1931 xy chromaticity coordinates which can be easily derived
from the XYZ coordinates. In more intuitive words, the chromaticity
describes the color without its brightness, and it does that in the
context of the human eye.

### Primaries

Display primaries are the "pure" colors emitted individually by the
red, green and blue component light sources in a display. Driving these
component light sources with different weights (color channel values)
produces all the displayable colors. As negative light does not
physically exist, the primaries span and limit the displayable color
volume. This color volume with the luminance dimension flattened is the
*color gamut* of the display. You might want to watch Captain
Disillusion's video on [Color][captain-disillusio-color] (7 mins) which
touches this topic while talking about human color vision.

Since RGB values are used for driving the component light sources, the
eye response for a certain RGB triplet depends on what those component
light sources are. Looking at each component light source individually
in total isolation and darkness, the observed chromaticity should
remain constant over the component's luminance range. Displays are
manufactured to achieve this, and so a single chromaticity coordinate
pair can be used to describe each of the three primaries in a display.

Conversely this applies also to image content prepared for display. The
RGB values in a stored image have been determined respective to certain
primaries. If the primaries used for the image are different from those
used by a display, then something has to happen to the RGB values to
make the image look as intended on the display.

Wide Color Gamut (WCG) displays have considerably larger color gamut
than traditional displays that have roughly the sRGB color space. This
means that ignoring luminance, WCG displays can show more colors and
they can be more saturated. The primaries of such displays are further
apart in the CIE 1931 xy chromaticity plane, covering a bigger area of
the human vision. While there may be little difference in the colors
between two traditional displays, a WCG display makes a noticeable
difference.

[captain-disillusio-color]: https://www.youtube.com/watch?v=FTKP0Y9MVus

### White point

White point describes the white balance of a display and is usually
controlled through a monitor's color temperature setting. Expressed as
chromaticity coordinates x and y, white point is a more general concept
than the one-dimensional color temperature.

The chromaticity coordinates of a display white are called the
(display) white point. Display white is the color with R, G and B value
at their maximum. Furthermore, displays are manufactured and/or
calibrated such that any neutral color, $`R=G=B`$, has the same white
point chromaticity. Display white point is a physical (or firmware)
property of the display, similar to the chromaticities of its
primaries.

Again, the white point for image content may differ from the white
point of a display which then requires color adjustment to compensate.

In more general terms, the definition of white comes from observing a
perfect diffuse reflector under a specified illumination. Essentially
the color of white is the color of the illumination. This is not the
same as display white. Depending on various things like the environment
where the display is in and what image is being shown, what would be
perceived as white may or may not match the display white.


## Dynamic Range

Everything said above has been very vague about the dynamic range or
the available brightness range. The dynamic range has been assumed to
be both relative and unknown. Relative means that we only deal with
normalized luminance or intensity values in the range $`[0.0, 1.0]`$ or
from 0% to 100%. Unknown means that we do not know (or care) about what
absolute luminance in cd/m² that 100% value means. Usually talking
about relative luminance implies it is also unknown but reasonable for
viewing. This is all good enough for standard dynamic range (SDR).

This is not good enough for high dynamic range (HDR). While the
absolute luminance of maximum white on SDR monitors can in practice be
around 100 to 250 cd/m², HDR monitors go much higher up to 600, 1000
cd/m² or even more. You do not want to show "graphics white" at full
1000 cd/m² blast. The displayable dynamic range must be known. As a
side note, HDR displays tend to be WCG as well.

High dynamic range is not only about going for brighter and brighter
highlights with detail, it is also about going darker and with more
precision. The absolute luminance of the black level in HDR content can
be significantly darker than in SDR content. This is particularly
useful in dark room viewing environments where SDR signal would just
lose detail in the dark shades.

HDR monitors usually advertise their absolute luminance limits via EDID
or DisplayID. If a monitor was driven with a traditional relative video
signal, this would give the 0% to 100% range. Let us call this the
*passive HDR mode*. The monitor input signal range maps exactly to the
displayable monitor dynamic range and color gamut in a static way that
can be measured and modelled. This requires the signal source to adapt
the content to the monitor capabilities, but the result is predictable.

However, consumer HDR monitors are usually driven with some
standardised signal system. That makes it easy for the signal sources
as they do not need to adapt to the monitor capabilities, but then the
monitor itself will be doing the image adaptation. Let us call this the
*adaptive HDR mode*. These proprietary adaptation algorithms can be
based on HDR metadata and image content, and they are often dynamic.
That makes these monitors practically impossible to measure and model.
The result on screen is unpredictable but probably good enough for
entertainment purposes.

There are two prevalent HDR video signal systems aside from the closed
Dolby system.

The PQ system defines the Perceptual Quantizer (PQ) EOTF that maps
pixel color values to absolute luminance in cd/m². The 100% level of a
PQ signal is 10,000 cd/m² which is practically beyond any monitor's
capabilities. When you use PQ EOTF to decode color values into linear
values, you get literal cd/m² values. These absolute luminance values
are good to display as is only on a so called mastering display where
the color control of the production has been performed. For any other
display, like any display one might have at hand at home or in the
office, some kind of *tone mapping* must be done to adapt the content to
the display capabilities. This is why PQ system HDR signal usually
comes with information about the mastering display used.

The HLG system defines the HLG OETF that is used to encode producer
optical color values into electrical values. It also defines the
parameterised HLG Opto-Optical Transfer Function (OOTF) that must be
used in a display to tone map the decoded (with the inverse OETF) video
signal for the monitor at hand. The OOTF takes care of a suitable
mapping of the content to monitors with different dynamic ranges.


## Viewing Environment

Adapting content to the monitor at hand is not enough. The viewing
environment also affects the perception of color on screen. Color
appearance modeling studies the effect of environment and everything
else that changes the human perception of colors without actually
changing the colors themselves.

One of the major physical effects is screen flare, ambient light
reflected from the screen surface. While the flare is usually fairly
uniform, colorless (white), and dim (otherwise it bothers the viewer
anyway), it can still have a big impact on the perception of dark
colors. This is because the flare can easily dominate the dark colors
emitted by a display and the human visual system adapts to the overall
brightness of what it sees, losing visual details as the dark colors
are no longer distinguishable under the elevated overall brightness.

The overall illumination of the viewing environment affects the
adaptation state of the human visual system. It affects both the
dynamic range (brightness) adaptation and the white point adaptation.
White point adaptation can be observed from different perceived colors
of white. A (display) color that looks white in one environment can
look bluish or reddish in another environment. Usually this is
compensated with the monitor color temperature settings.

If the environment is very dark and the screen covers much of the field
of vision, then the human visual system adapts mostly to the screen
content as there is little other color reference in sight.

Fortunately viewing environments tend to be arranged to be somewhat
standardish for most use cases. Office environment tends to be well lit
indoors, and living room environment tends to be dim indoors with
(home) theatre environment as the culmination of very dark surroundings
with no stray light. Professional color work uses strictly controlled
viewing environments. This avoids environment related problems.


## Conclusion

We started with the assumption that the pixel was rendered with the
intention to be displayed. This hints that not all pixels are intended
to be displayed. In fact, if you took a raw image from a camera
(think of professional cameras with RAW file format), it would look
quite bad on a display even if you took care of everything mentioned in
this article. Such images need to be carefully processed before they
are ready for display. Consumer cameras do that processing on the
camera using manufacturer magic algorithms, so it is possible you have
never had such an image.

All the above may seem a lot to digest, and we did not even go into
details. The point of this article is to give you an idea of the
concepts related to a pixel's color, hopefully letting you follow other
discussions around color more easily, like why one cannot "just blend"
two pixels together or how "RGB" alone does not mean much.