EOTF

made the issue visible to everyone

Thanks for the comment. This isn't security relevant so I turned off the confidentiality.

Can you point out where exactly we use BT.709 as an EOTF and where the sRGB TF characteristics have a wrong definition?

Banner of EOTFs: https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/well_known.rst

Reference to the two part sRGB OETF as an EOTF: https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/pixels_color.md

Etc.

I marked this as private to keep my commentary out of the public sphere such that I can post screenshots to the specifications in question without treading into legal quagmires.

I still don't see where we use BT.709 in the well_known.rst document. It only lists BT.2087 which does define EOTFs for use with BT.709.

I see the sRGB part now, thanks. It is not entirely convincing though. This claims that the sRGB standard only defines an EOTF and that it is not a pure 2.2 power function.

In general we prefer open discussions but if you want I can turn confidentiality back on.

That link claims Jack Holm as authority. He is not listed as one of the four original authors in the original document, nor one of the five named acknowledged individuals who provided feedback and other aspects. Further, read the specification for the input / output characteristic. It’s very clear.

As further evidence, it is worth reading the entire document, closely. There are countless references to a pure 2.2 EOTF, introducing a discrepancy that is by design.

Further, here is Daniele Siragusano, of Filmlight, discussing the nuance of the discrepancy and why it was introduced, in detail.

As for the well_known.rst, BT.709 does not define an EOTF. It is identical in spirit to the sRGB OETF, and defines a specific OETF encoding. The canonized EOTF was introduced via BT.1886.

Hope there’s something useful here.

As for the well_known.rst, BT.709 does not define an EOTF. It is identical in spirit to the sRGB OETF, and defines a specific OETF encoding. The canonized EOTF was introduced via BT.1886.

I get that. I just don't see where we reference BT.709 in that document. Neither as EOTF nor as OETF. Maybe I'm blind but I just don't see it. Can you link to the exact line?

As further evidence, it is worth reading the entire document, closely. There are countless references to a pure 2.2 EOTF, introducing a discrepancy that is by design.

Unfortunately I don't have access to the sRGB spec which makes this a bit hard. I'll try to look into it, so thanks for bringing this up.

https://github.com/colour-science/colour/blob/0cd6265712920557b5840eac0a72310f56ff10ec/colour/models/rgb/transfer_functions/srgb.py#L99

https://github.com/LeaVerou/color.js/blob/fdff000bbe6258a1517046b9f67ead4a7df600e2/src/spaces/srgb.js#L26

https://en.wikipedia.org/wiki/SRGB

They all agree that the EOTF is not a pure Gamma 2.2 function. Are they all wrong?

Here’s the specification sections that describe the reference display. Note how the following section describes encoding, and the intentional mismatch.

The spec and the talk by Daniele Siragusano are pretty convincing. At this point I'm mostly confused why everyone else also seems to get it wrong.

I too was originally of the belief, largely due to appeals of authority, that the specification mysteriously described an EOTF. Until I crawled the actual document. It seems very clear.

The reason why so many folks get it wrong is, I believe, they fail to historicize and contextualize.

If we think about the history and what sort of mess the surface was, we can get an idea as to what the sRGB specification was attempting to solve. Reading the Annexes in the document can also help here. It is rather clear that the goal was to create a parallel BT.709 based approach, complete with encoding and decoding, where the decoding had the “oldschool” appearance matching baked into the discrepancy of the end-to-end encoding to decoding differences.

This “oldschool” approach was actually an appearance-like management; viewing under one set of contexts would create a similar appearance as under another, which can only be achieved by deviations in the tristimulus values from context to context.

Even more interesting is an actual managed pipeline. We know that most conventional sRGB displays, when measured, are frequently pure 2.2 power functions. Not all; some vendors have chosen to use hardware to emulate the sRGB OETF as an EOTF.

If in the common case the display is actually a pure 2.2 power function, but the operating system identifies it (incorrectly) as an sRGB two part transfer function, the following applies:

Image encoded and tagged with the sRGB two part transfer function.
The ICC management chain seeks the display characterization, which by default is identified as the sRGB two part transfer function.
The actual management library “undoes” the encoding transfer function (sRGB two part) to uniform tristimulus, and then re-encodes to the target output (sRGB two part). We end up with the two part encoding being routed to the display.
The display, which exhibits a common 2.2 power function as an EOTF, decodes the two part using the 2.2 pure power EOTF!

Sound familiar?

It should, because that happens to be exactly, through a bit of an abuse of default settings, the definition of what the reference dictates as per the “oldschool” appearance matching context encoding-through-decoding chain.

Daniele is a hell of a wise soul, with vast experience of many, many, many display types, from the highest quality projectors to the most common commodity displays. While it is foolish to lean on appeals to authority, his vantage is well formed and outlines the precise issues at hand, expressed very clearly in the video.

In an ideal “forward looking” management system, it is a no contest that the EOTF should be sent an encoded signal that has appearance based qualia built into the values for consumption. We sadly are not there yet, but appreciating the nuance and complexity of the appearance aspect in deviations in encoding can help to at least contextualize why the surface can be extremely complex.

It is sadly not entirely about a no-operation round trip from code value to uniform tristimulus optical output. In fact, knowing that a no-operation would fail on the appearance management front, is an extremely useful part of the understanding.

There are four original authors, and five acknowledged names in the origin document. Reach out to them! (I have reached out to four of those nine names listed, as a side note. Take everything said here with a massive grain of salt, and any reader should be highly encouraged to put the effort into understanding the differing vantages and reasonings. It helps having more folks actually processing and becoming familiar with the surface!)

Created !14 (merged) for the sRGB issue.

Hello Troy, thank you for raising this issue.

I had (after much confusion) recognized with other specifications that they define only the encoding and punt to another specification for the decoding to display, e.g. BT.601 vs. BT.1886, and that the transmission intentionally changes the colorimetry, but I hadn't realized the same fully applies to sRGB as well.

I watched the video by Siragusano, very interesting and I hope I drew the right conclusions from it. My main take-away is that sRGB encoding and sRGB Display are different things. The intention is to implicitly match the appearance when the original image was meant for a dim viewing environment and then it will be watched in a brighter viewing environment.

I have been wondering if content (compositor input) should be decoded with the encoding function to get back to the original colorimetry or with the intended decoding (display) TF to apply the intended implicit tone mapping from dim to a brighter viewing environment. Since several encoding specifications say that this is the encoding function but it will be adjusted such that the display EOTF (e.g. BT.1886) produces the desired appearance, I suppose that display EOTF then is what we should use to decode content tagged with that encoding specification. Then we need to add tone mapping to move from the encoding specification's reference display to the actual display and viewing environment. Finally, encode the video signal with the actual (or specified) display EOTF.

As an example, an image tagged with sRGB encoding TF shall be decoded with sRGB Display TF, but also an image tagged with sRGB Display TF shall be decoded with sRGB Display TF.

Does that make sense?

I have been wondering if content (compositor input) should be decoded with the encoding function to get back to the original colorimetry or with the intended decoding (display) TF to apply the intended implicit tone mapping from dim to a brighter viewing environment.

A lot to unpack.

First, it is worth thinking relative to a larger lens; implicit vs explicit management. The older system was implicit, while more contemporary approaches are potentially explicit, and achieve management via a library.

But even if one were to try and transition, it can be extremely challenging. For example, this is the issue facing MacOS with BT.709 content. Given the content is tagged as BT.709, and an assumption of the two part OETF is implied, the “original colourimetry” is not the intended output, but rather the BT.709 OETF was designed to be coupled with the BT.1886 2.4 EOTF. Most systems do not provide for this sort of “appearance” discrepancy. This is a common issue on MacOS, where the OS is attempting to Do The Right Thing, but yields The Wrong Thing.

Second, nothing actually “maps tones”, and it is worth dismissing. It’s a bad idea that leads to plenty of other problems. It is likely another discussion. A “tone map” is a specific form of a transfer function that should likely be considered in a different class altogether of OETF vs EOTF functions, or their inverses. There are OOTFs and EETFs, so it is worth keeping clarity here.

It’s a challenging surface. If the management system allows for an Appearance Emulation TF, perhaps the “correct” signal can be derived, and further appearance adjustments that deviate from the implicit viewing conditions etc. could be integrated on top of?

A lot to unpack.

I was expecting that. :-)

It seems you mostly corrected my terminology, and that's good. I'm not always using the right terms like "tone map" which I have seen mentioned mostly in connection with HDR, when you map between SDR and HDR or between two different HDR displays. What I actually meant was some sort of TF indeed.

If I understood you right, we seem to agree pretty much. The original colorimetry (in implicitly managed systems) is not the intended output. The intended output comes only with the intended decoding TF, and then it is "conditioned" to its particular reference display and environment. Then we need an Appearance Emulation TF (which I guess would be an OOTF (not end-to-end but just some optical-to-optical TF)) to be able to display that content on a display that differs from the particular reference display (and environment).

We would try to infer the Appearance Emulation TF from two sets of information:

What specs the content is tagged with, what they say about the reference display and environment
What we know about our actual display, and what knobs we might want to expose to end users (brightness? contrast? BT.1886 variables? HDR something?)

Somewhere in there we should also handle the SDR vs. HDR vs. HDR differences as well. Should that be part of Appearance Emulation or have it's own name?

Or did you mean that e.g. BT.709 would be decoded to the original colorimetry and then we explicitly add Appearance Emulation TF that achieves the same as what decoding with BT.1886 would have done in the first place? Except that we might modify that AETF to target other displays and viewing environments.

The main difference seems to be terminology, is it not?

It seems you mostly corrected my terminology, and that's good.

“Tone map” typically will refer to something that forms the image, while these things are more directly related to the signal. I generally defer to Dr. Poynton’s definition, given he has written a PhD thesis on this very subject, of Axiom Zero, as it cuts through the layers of nonsense and gives us something to work with immediately.

Faithful (authentic) presentation is achieved in video / HD / UHD / D-Cinema when imagery is presented to the consumer in a manner that closely approximates its appearance on the display upon which final creative decisions were approved.

Framing the problem under this sort of lens can help to make reasoned decisions that work toward Axiom Zero.

The original colorimetry (in implicitly managed systems) is not the intended output.

I agree with this. The original colourimetry is not solely in the encoded signal colourimetry, but a combined total through-output of the OETF and the EOTF. There are additional considerations such as where an audience is not present in the specified surround, and how to closely approximate an appearance match, but the question as to “What is the signal to be matched?” would be a ground zero consideration.

Then we need an Appearance Emulation TF (which I guess would be an OOTF (not end-to-end but just some optical-to-optical TF)) to be able to display that content on a display that differs from the particular reference display (and environment).

Indeed. The question as to how to properly achieve this remains open. Per channel adjustments to a signal will yield distortions of chromaticity angle, luminance, and chromaticity purity, which were somewhat abused in the implicit system to account for Hunt effect etc.

The per channel approach however, introduces distortions that are likely unfit for larger domain swings. Appearance matching and models is a subject of research given the more contemporary display mediums available.

What we know about our actual display, and what knobs we might want to expose to end users (brightness? contrast? BT.1886 variables? HDR something?)

The main issue here is that if the solution were only targeting outputting authored imagery, it would be complex enough. In the case of compositing multiple things, including this imagery, it requires stepping back to figure out a more ideal virtual encoding, and then compositing and getting the signal output.

It is reasonably safe to assume in this case, that something beyond the generalized ICC explicit approach is required to meet Axiom Zero. That is, the ICC general protocol will attempt to:

Undo any encoded signal characteristics.
Derive colourimetric transformations from the source origin encoding to the destination output encoding.
Encode the signal via an inverse EOTF as indicated by the destination output context.

We can clearly see that this is the classic conundrum that explicitly managed systems will face. If there were the means to communicate the intended appearance transformation in between (1) and (2), we would have the uniform tristimulus signal required for a fighting shot of managing toward the output.

The two most problematic encodings being BT.709 and to a lesser extent sRGB. It would also seem important to be able to expose this Appearance Rendering stage to the audience, as there will be plenty of times where an audience may require granular control over the behaviour, likely on an application to application basis.

Somewhere in there we should also handle the SDR vs. HDR vs. HDR differences as well. Should that be part of Appearance Emulation or have it's own name?

This is a very tricky subject. Gilchrist’s research and anchoring theory suggests that HDR is ultimately a fool’s errand. We could suggest that we are starting to see the implications of this via what I would call Diffuse White Creep.

Gilchrist’s Anchoring Theory loosely indicates that our perceptual system will always adapt immediately to the highest luminance present in our visual field, and segment accordingly. This has been researched rather extensively, and holds true to a certain undefined area threshold in terms of visual segmentation.

So what are the implications here?

The origin position in the HDR signal for a “diffuse white” reflectance surface was originally 100 nits achromatic. A more recent “standard” of 203 nits has evolved. Some vendors involved with operating system design have let that creep higher, to somewhere in the 350 nit range.

This general trend at least hints that Gilchrist’s Anchoring Theory research is a reasonable ground truth, and that as brighter emissions are presented to the audience, the visual segmentation is “pushed down”, hence the creeping diffuse white upwards.

This makes appearance matching a very tricky subject, of which the jury is still out as the newer display mediums arrive in the hands of audiences for evaluation.

In the short term, some basic assumptions for BT.709 and sRGB content, coupled with a system wide appearance viewing context setting, would at least permit a reasonable output for the audience. EG: If the system is set to “Bright” surround, the internal system could calculate the output for BT.709 in a “Dark” viewing context, and adapt the signal to the “Bright” setting.

Or did you mean that e.g. BT.709 would be decoded to the original colorimetry and then we explicitly add Appearance Emulation TF that achieves the same as what decoding with BT.1886 would have done in the first place? Except that we might modify that AETF to target other displays and viewing environments.

Exactly this. As outlined in the general ICC protocol, we can see where the implicit management is unachievable without a specific “Intended Appearance” transfer function included. Unsurprisingly, this is the exact problem that manifests in MacOS, which is otherwise several time zones ahead of the rest of the pack in terms of management systems. We can also see Apple gradually working toward this idea of Appearance Emulation via True Tone and the rest of the rather sophisticated automatic scaling happening.

The main difference seems to be terminology, is it not?

Not quite! Because again, the general idea that the ground truth of the colourimetry is in the origin encoding is an a priori problem vector between implicit and explicit management approaches. Any solution needs to provide a reasonable bridge to take legacy implicit assumptions, which are extremely prevalent, and not only migrate them to an explicit system, but also provide mechanisms for the explicit system to properly manage the chain further.

Well said, I feel like I understood most of what you wrote and it makes sense.

The main issue here is that if the solution were only targeting outputting authored imagery, it would be complex enough. In the case of compositing multiple things, including this imagery, it requires stepping back to figure out a more ideal virtual encoding, and then compositing and getting the signal output.

Do you mean that when we composite things, window A near window B will change the appearance of window B (and vice versa) because both windows will be in our field of view at the same time? And that ideally we should account for that?

That reminds me of the Apple demo where a floating HDR window was on a desktop; the desktop was gradually dimmed to make the HDR window pop out and give it the HDR feeling. That trick would lose its effectiveness if everything was equally HDR. Which I think connects to what you said about Diffuse White Creep.

Even if Diffuse White Creep takes the impression off of highlights, I suppose we would still get the benefit of the added color resolution in HDR systems, and the extra available light power to look better in bright environments.

Exactly this. As outlined in the general ICC protocol, we can see where the implicit management is unachievable without a specific “Intended Appearance” transfer function included.

In other words, if an ICC profile describes how the content was encoded, we need another profile (at least a TF) to explain how it should be decoded instead of simply undoing the encoding? But ICC profiles can record AToB and BToA transformations separately, wouldn't that help?

My thought is that for cases like BT.709 we would need the content tagged explicitly as BT.709 and then we know how to handle it. If content is tagged with only an ICC profile (without an embedded tag that says "BT.709"), then we do what the ICC workflow usually does.

What would be the difference between:

decoding BT.709 with BT.1886 and then adding another AETF to take from the reference display and environment to the actual display and environment, vs.
decoding BT.709 with the BT.709 encoding TF, and calculating an AETF from ... [what?] to the actual display and environment?

I mean, other than how exactly one formulates the AETF in each case. Would both approaches not produce the same appearance assuming they are not disagreeing on the principles and definitions? Or would the disagreement be the point?

Do you mean that when we composite things, window A near window B will change the appearance of window B (and vice versa) because both windows will be in our field of view at the same time? And that ideally we should account for that?

I'm not sure it can be accounted for, it is simply that the surface of the potential issues is still somewhat unknown. For example, the assumption that we can effectively, contrary to what Gilchrist et als research suggests, have a "diffuse white" and that values can happily go past it, is problematic when considering the mise-en-scene of a desktop when an EDR window is presented adjacent to a generic non-EDR output window. According to conventional thinking, this is fine. According to Gilchrist et al, the EDR window will result in the rest of the desktop values being anchored to the maximal luminance.

In other words, if an ICC profile describes how the content was encoded, we need another profile (at least a TF) to explain how it should be decoded instead of simply undoing the encoding? But ICC profiles can record AToB and BToA transformations separately, wouldn't that help?

I don't think the idea of an Appearance Intent adjustment needs to be too complex. It's simply that unless we properly historicize the content and the implicit chain, it will be challenging to see how the explicit chain doesn't have a means to account for the historical approach.

While it is easy to say "Just use the explicit approach and carry on", we can already see that in the case of BT.709 authored content, this is likely not changing any time in the future and we'd be faced with a disconnect. Again, the picture encoding in terms of colourimetry doesn't exist in the encoding itself, so there's a fundamental issue with how the work is presented.

My thought is that for cases like BT.709 we would need the content tagged explicitly as BT.709 and then we know how to handle it. If content is tagged with only an ICC profile (without an embedded tag that says "BT.709"), then we do what the ICC workflow usually does.

Codecs won't be tagged with ICC information. The typical approach is to use nclc tags to provide the required context. The nclc tags though, follow the general idea that the encoding of the intended image / picture is held in the colourimetry of the parcel.

What would be the difference between:

decoding BT.709 with BT.1886 and then adding another AETF to take from the reference display and environment to the actual display and environment, vs.

This approach effectively forms the intended picture colourimetry into a virtual encoding. At this point, the encoding represents the picture as intended, for viewing in a "dark" surround.

decoding BT.709 with the BT.709 encoding TF, and calculating an AETF from ... [what?] to the actual display and environment?

This is part of the issue that currently faces MacOS, and exactly why the "bug" happens. The issue here is that by decoding the BT.709 OETF, we end up with a uniform tristimulus signal that does not represent the intended picture, because the implicit discrepancy between the BT.709 OETF through BT.1886 (or other) EOTF is removed, so the proper colourimetry is not yet formed.

So when the library undoes the standard BT.709 encoding, and negotiates it for re-encoding for the output, the issue is that the picture "looks flat", among other qualia differences.

The only way I can see to move out of this is to derive the assumed viewing context and picture colourimetry via an intermediate virtual encoding, via something like the Appearance Intention TF, and then work backwards to other viewing contexts. The other option is to shim in something akin to an AITF and take the picture to a "neutral" encoding, with the subtle additional "dark" compensation removed, and then apply the viewing surround elsewhere.

Would both approaches not produce the same appearance assuming they are not disagreeing on the principles and definitions?

I don't see how the second option can derive the intended picture, given the intended picture and viewing context is coupled and formed only after the BT.1886 output? Where is the picture deduced in the second case?

This discussion came up the other day, and I did up this simple diagram to outline the underlying issue. YMMV.

Bottom line is that the future, while still hazy, would benefit from providing the infrastructure for viewing context into the display chain. Even if unused currently, it would be a forward looking feature that can be leveraged to more cleanly solve this current issue, as well as the issues that are coming down the tunnel. Separating the viewing context from the picture encoding is here to stay.

According to Gilchrist et al, the EDR window will result in the rest of the desktop values being anchored to the maximal luminance.

Only read the first 7 or 8 pages of the article, but it didn't seem to be that clear. So yeah, for now we can forget about managing appearance effects between windows, and that's good news to me. It didn't sound like an EDR window on an otherwise SDR desktop would be a completely lost case. It depends on the contents of everything.

That article does seem to connect somehow to the question of how to map the dynamic ranges of different kinds of content into a common space for composition.

Another reason why we should not account for appearance effects caused by other windows is that Wayland applications have the option to tag their content as being rendered specifically for the output (and environment) at hand, which means the compositor should practically pass it through unchanged if it's opaque and unoccluded.

Codecs won't be tagged with ICC information. The typical approach is to use nclc tags to provide the required context. The nclc tags though, follow the general idea that the encoding of the intended image / picture is held in the colourimetry of the parcel.

I have a feeling that there is a detail missing. I meant tagging in the context for Wayland protocol, where a video player application could tag a window with e.g. "BT.709", and "we" handling it would be the compositor. How exactly that tagging works is still being drafted, but you can use an ICC file or alternatively some enumerated and/or parametrized erm... tags.

A video player would translate from codec tags(?) to Wayland parlance.

So when the library undoes the standard BT.709 encoding, and negotiates it for re-encoding for the output, the issue is that the picture "looks flat", among other qualia differences.

Sorry, what library?

I don't see how the second option can derive the intended picture, given the intended picture and viewing context is coupled and formed only after the BT.1886 output? Where is the picture deduced in the second case?

I tried to ask what is the difference between

decoding BT.709 content with BT.1886 EOTF and apply AETF, and
decoding BT.709 content with BT.709 OETF⁻¹ and applying (BT.709 OETF ∘ BT.1886 EOTF) and AETF

I think it was just a misunderstanding and there is no difference. I just got the impression from you earlier comment that somehow one should use BT.709 OETF⁻¹ somewhere.

Or maybe your point is that while they are equivalent, the former is implicit while the latter is explicit about what happens to the signal?

How would you name the element performing (BT.709 OETF ∘ BT.1886 EOTF) in this case?

Bottom line is that the future, while still hazy, would benefit from providing the infrastructure for viewing context into the display chain. Even if unused currently, it would be a forward looking feature that can be leveraged to more cleanly solve this current issue, as well as the issues that are coming down the tunnel. Separating the viewing context from the picture encoding is here to stay.

Right! @swick has actually already commented towards such plans in the Wayland protocol extension design.

What I have no clue about is, what would be a useful description of a viewing environment in a protocol.

I took a glance at Dr. Poynton's thesis' index and abstract, and that seems good reading to have indeed.

So yeah, for now we can forget about managing appearance effects between windows, and that's good news to me.

Just to be clear, I'm not advocating either way. I included the references to show that this surface in terms of a design problem is completely unsolved, and that there's room here for innovation as long as we accept that a good deal of complexity is sitting in front of us.

Consider the citation as more as a cautionary tale against an overly simple solutionism.

That article does seem to connect somehow to the question of how to map the dynamic ranges of different kinds of content into a common space for composition.

I'd say this is a given; there needs to be a "virtual" encoding to derive the colourimetry as composited. This is, as best as I can tell, what MacOS appears to be doing. Whether that is an HLG-based or PQ based virtual encoding remains an open question. A case can be made for either.

Another reason why we should not account for appearance effects caused by other windows is that Wayland applications have the option to tag their content as being rendered specifically for the output (and environment) at hand, which means the compositor should practically pass it through unchanged if it's opaque and unoccluded.

I'd think that any proper management system needs to account for both cases? It would seem logical that some applications may request and need an iron fisted control over the entire transformation output chain. Likewise, we can see a need for applications without any understanding / awareness to be handled appropriately.

I'm not sure it's either or, but rather both would need facilitating.

I meant tagging in the context for Wayland protocol, where a video player application could tag a window with e.g. "BT.709", and "we" handling it would be the compositor. How exactly that tagging works is still being drafted, but you can use an ICC file or alternatively some enumerated and/or parametrized erm... tags.

The origin post was in fact somewhat directed at this. There's no means in an explicit system to derive appropriate colourimetry, so an awareness of the implicit versus explicit differences is required to achieve correct handling. EG: The "shim" is required in the case of a majority of BT.709 content in order to derive appropriate colourimetry. Arguably this is the case as well with sRGB, but the discrepancy is lower.

Sorry, what library?

A library that follows the general explicit management chain outlined in the ICC protocol. By "general" I mean "in the spirit of", given that the ICC protocol sort of went off the rails with V4.

I just got the impression from you earlier comment that somehow one should use BT.709 OETF⁻¹ somewhere.

In an explicitly managed system, this is what happens by definition. It's clearly not "correct", but without a gentle massaging of the protocol, it's unavoidable. Where "massaging" could be adding something like the AITF etc. Technically, this would likely be an EETF shimmed in just after the OETF for generic BT.709 content, where the implicit assumption is a pure 2.4 EOTF and a "dark" surround.

We can see that in the contemporary sense, it would be very likely that a management system will want to handle the surround adaptation component, so perhaps it is worth addressing as a longer term goal.

How would you name the element performing (BT.709 OETF ∘ BT.1886 EOTF) in this case?

If we follow the chain, I believe the type of transfer function that would be required becomes clearer.

Encoding is BT.709 OETF. This suggests that it has travelled from Optical to Electronic down-the-wire encoding.
Decoding is BT.1886 EOTF. This suggests that the encoding is decoded to Optical.

In our case, we need a "shim" in the system.

Encoding is BT.709 OETF. This means it's currently in an Electrical Transfer Function encoded state.
Shim is here, to create the picture rendered colourimetry. Given the prior signal is Electrical, this is an Electrical to Electrical Transfer Function, or EETF.
Decoding is sRGB or BT.1886 or whatever output. This would be an EOTF, and the signal ends up Optical.

I think that would qualify the shim as an EETF.

What I have no clue about is, what would be a useful description of a viewing environment in a protocol.

You aren't alone. Most viewing conditions are rather vaguely described as "dim" versus "dark" or even "bright" or "average". The historical approach is to apply a simple power function to the signal values, and wave the hands a bit.

There is a decent paper that has done some research on this sort of approach, and might be worth including in the description and measurements component of any attempt to design for this. Determination of the Perceived Contrast Compensation Ratio for a Wide Range of Surround Luminance by Ye Seul Baek, Hong-suk Kim, and Seung-ok Park, 2014.

As a shorter term solution, where the non-pivoted power function approach is used, one could do worse than the values presented in that paper.

As an aside, this discussion came up twice over the past week, and I did up this diagram to showcase how the implicit versus explicit management chain diverges. I suspect that to negotiate the problem, an EETF-like shim is required, as discussed above. Regardless, here's the diagram that may prove helpful for illustration purposes.

I followed this discussion and at this point there are a few comments I want to make.

Any OETF+EOTF combination where the EOTF(OETF(x)) != identity exists because the viewing context of the content creation is different from the viewing context of the consumption. Decoding such an image with the EOTF or the OETF^-1 both produces valid colorimetry, just for different viewing contexts. That specifically means decoding with OETF^-1 and applying your own Appearance Emulation from the viewing context of the content creation to the viewing context of the consumption as specified by the EOTF is the same. I believe this is what @pq also wanted to say.

You seem to argue that applying the OETF^-1 produces invalid colorimetry, I believe that it only changes what viewing context the colorimetry applies to. (With the notable exception of HLG.)

The next point is this one:

The only way I can see to move out of this is to derive the assumed viewing context and picture colourimetry via an intermediate virtual encoding, via something like the Appearance Intention TF, and then work backwards to other viewing contexts. The other option is to shim in something akin to an AITF and take the picture to a "neutral" encoding, with the subtle additional "dark" compensation removed, and then apply the viewing surround elsewhere.

I don't think we ever need or want a "neutral" or "virtual" encoding. Colorimetry only creates a perception in a viewing context and you can't undo or remove a viewing context from colorimetry. In other words, the viewing context is relative and you can choose any viewing context to be your reference.

What is important is that we actually do have a complete description of the viewing context for all the colorimetry we deal with. Unfortunately a lot of standards don't really define the viewing context all that well and we'll have to come up with values there that seem to make sense.

The ICC system is completely misunderstood here. It doesn't require a specific transforms but rather gives the CMM tools to create whatever system makes sense. If we want to build a system which tries to create the same perception in all viewing contexts then we can do so. The ICC profile then serves only to do colorimetric transforms and the CMM applies an appearance transform based on the viewing contexts which might have been delivered in the ICC profile or out of band.

If the ICC profile has a perceptual transform we can use that instead of our own appearance transform to transform the colorimetry to the a specific viewing context, in ICC terms the perceptual profile connection space. Basically, the perceptual transform means we get colorimetry in a specific viewing context whereas the colorimetric transform requires you get the viewing context from somewhere else.

The origin position in the HDR signal for a “diffuse white” reflectance surface was originally 100 nits achromatic. A more recent “standard” of 203 nits has evolved. Some vendors involved with operating system design have let that creep higher, to somewhere in the 350 nit range.

This general trend at least hints that Gilchrist’s Anchoring Theory research is a reasonable ground truth, and that as brighter emissions are presented to the audience, the visual segmentation is “pushed down”, hence the creeping diffuse white upwards.

I'm pretty sure that the brightest part of a scene is not what HVS adapt. If there is a small part of the scene which is much brighter than the rest of it I can still see the rest very well. The same goes the the diffuse white concept for displays. It should work as long as content doesn't have a large part brighter than diffuse white.

The higher brightness for diffuse white can be easily explained by the bright viewing environments people use displays in. Non-HDR displays get brighter as well and that means people can use them in brighter viewing environments. If an HDR display is then used in the same bright viewing environment you have to match the diffuse white to the bright non-HDR display.

This is a problem of unspecified or at least under-specified viewing contexts.

Decoding such an image with the EOTF or the OETF^-1 both produces valid colorimetry, just for different viewing contexts.

If this were correct, then you'd need to identify what the "default" viewing context is for BT.709's OETF. There simply isn't one. It was never designed this way.

Further, there's a discrepancy in the OETF lower code values when considered against a uniform power function EOTF that was designed to handle noise / veiling flare. That too is unaccounted for via a simple inverse of an OETF.

You seem to argue that applying the OETF^-1 produces invalid colorimetry, I believe that it only changes what viewing context the colorimetry applies to. (With the notable exception of HLG.)

That's exactly what I am saying. The "picture rendering" is not present. This follows what Dr. Poynton and others have elucidated.

I don't think we ever need or want a "neutral" or "virtual" encoding.

You'll need a virtual encoding for compositing on a desktop, otherwise you will not have a uniform compositing working space to work with. Note that "neutral" here is something different, and subject to another discussion.

The ICC system is completely misunderstood here. It doesn't require a specific transforms but rather gives the CMM tools to create whatever system makes sense.

It's not misunderstood at all. Implicit vs Explicit management is very real, and for those who aren't familiar, the discrepancy between BT.709's OETF and MacOS's explicit handling is well known and documented for a long while.

If the ICC profile has a perceptual transform we can use that instead of our own appearance transform to transform the colorimetry to the a specific viewing context, in ICC terms the perceptual profile connection space.

You are conflating rendering intents, which are optional, with picture colourimetry.

Also, I would be shocked if there were an open source ICC library that can apply GPU shaders at the speeds required here, even if a rendering intent were included, which they often are not. Generally speaking, rendering intents do not account for surround viewing conditions, but rather for colourimetric transforms. I don't think any such rendering intent for surround viewing contexts has ever been defined in any ICC specification. Happy to be proven wrong here.

I'm pretty sure that the brightest part of a scene is not what HVS adapt.

I'd suggest that it would be prudent to read the research. The interactions of surface brightness and lightness are complex. Anchoring clearly plays some role, as does adjacent surface definitions. Needless to say, the research is still ongoing here, but the idea that our vision would remain fixed with higher luminance elements in the field of view is not easy to verify the veracity of.

Anyways, good luck folks.

Any OETF+EOTF combination where the EOTF(OETF(x)) != identity exists because the viewing context of the content creation is different from the viewing context of the consumption. Decoding such an image with the EOTF or the OETF^-1 both produces valid colorimetry, just for different viewing contexts. That specifically means decoding with OETF^-1 and applying your own Appearance Emulation from the viewing context of the content creation to the viewing context of the consumption as specified by the EOTF is the same. I believe this is what @pq also wanted to say.

I'm not so sure. I was looking at it as a purely mathematical equation, where you merely group the terms differently, but always reach the exact same result.

As for decoding with OETF⁻¹, I have acknowledged that specifications like BT.709, while they do define the encoding OETF, also say that what production actually does shall be modified such that the intended appearance is achieved by decoding with BT.1886. I did not assume those modifications are viewing environment related but simply artistic while the artist is looking at a BT.1886 monitor.

I think Troy seems to agree with this.

As a detail, since all the TF discussed here are applied to RGB or R'G'B' channels independently, any deviation from the intended decoding TF will necessarily introduce hue shifts to my understanding, even though they might be mostly insignificant. What this means is that the colorimetry of the BT.709 signal may be intentionally hue-shifted so that it produces the desired hue when decoded with BT.1886. I believe this to be simply a mathematical fact.

You'll need a virtual encoding for compositing on a desktop, otherwise you will not have a uniform compositing working space to work with. Note that "neutral" here is something different, and subject to another discussion.

Right, we do need a common compositing space. I think the argument here is about what exactly that is, or how to name it. Our current idea is to use the monitor's "device color space" but linearised so that blending RGB values is meaningful. In other words, not to use some artificial or standard space as the blending space. With this choice, the final encoding for the monitor is just the inverse of the monitor (claimed) EOTF which is much easier to off-load to display hardware than more complicated transformations.

I would be shocked if there were an open source ICC library that can apply GPU shaders at the speeds required here

I guess that's the essence of what we are aiming for here. Use some CMM library like LittleCMS to assemble the (arbitrary, not limited by ICC) transformations; in the compositor, optimize and convert the resulting pipelines for a GPU, and then apply them during composition each frame. Plus, when possible, avoid the GPU and use the display hardware pipeline to realize the same.

The work on-going in Weston right now by @vitalyp is exactly about taking an arbitrary pipeline assembled with LittleCMS, optimizing that, and realizing it through OpenGL.

Thanks, Troy! I'm looking forward to occasionally discussing things with you again. You already gave a lot to digest.

That's exactly what I am saying. The "picture rendering" is not present. This follows what Dr. Poynton and others have elucidated.

As for decoding with OETF⁻¹, I have acknowledged that specifications like BT.709, while they do define the encoding OETF, also say that what production actually does shall be modified such that the intended appearance is achieved by decoding with BT.1886. I did not assume those modifications are viewing environment related but simply artistic while the artist is looking at a BT.1886 monitor.

Yeah, skimmed the spec again and you're both right, that actually seems to be the case. So decoding with the right EOTF is necessary, further Appearance Transforms must work on the resulting colorimetry.

You'll need a virtual encoding for compositing on a desktop, otherwise you will not have a uniform compositing working space to work with. Note that "neutral" here is something different, and subject to another discussion.

It's really not clear what you mean with virtual and neutral and my point is that there is no special encoding and you can arbitrarily select one or multiple to work with.

It's not misunderstood at all. Implicit vs Explicit management is very real, and for those who aren't familiar, the discrepancy between BT.709's OETF and MacOS's explicit handling is well known and documented for a long while.

Obviously there is a difference in how one has to handle content tagged with an ICC profile and content tagged with an encoding standard but CMMs can take both into account if the content is tagged correctly. I don't really see what the issue here is. Except maybe content tagged with a wrong ICC profile (i.e. the sRGB two-piece function when it should be the sRGB pure power-law 2.2) but not sure what we can do about content with bad metadata.

If your whole point here is "decode with the correct EOTF and not with the OETF^-1" then sure, yeah, makes sense.

You are conflating rendering intents, which are optional, with picture colourimetry.

Also, I would be shocked if there were an open source ICC library that can apply GPU shaders at the speeds required here, even if a rendering intent were included, which they often are not. Generally speaking, rendering intents do not account for surround viewing conditions, but rather for colourimetric transforms. I don't think any such rendering intent for surround viewing contexts has ever been defined in any ICC specification. Happy to be proven wrong here.

The perceptual transform is specifically designed to adjust from one viewing context to a connection viewing context. What exactly the profile creator considers the viewing context in the profile creation is up to the profile creator but the point here is that it is not a colorimetric transform! It is supposed to adjust for the differences in media and viewing conditions and produce different colorimetry.

Also worth noting that the rendering intent is not the same as the transforms defined in the profile. CMMs can use the colorimetric transform even if a perceptual rendering intent is specified and for example add its own viewing context adjustment on top of the colorimetric transform to implement a perceptual transform.

I think the argument here is about what exactly that is, or how to name it. Our current idea is to use the monitor's "device color space" but linearised so that blending RGB values is meaningful.

This is the great dilemma.

What I see though is the need for cross-desktop compositing. We can envision a dual head mode where one display is some variant of BT.2100 with perhaps an ST.2084 EOTF, and another where it is a generic 2.2 EOTF. The question then becomes one of efficiency of encoding intermediate virtual display?

This is why I believe MacOS is encoding to something like a virtual HLG or ST.2084 based virtual picture range, and compositing into a (hopefully) linearized version of that.

Why might this be deemed prudent?

For starters, it represents a consistent virtual encoding that would hold up for at least a few years potentially. It also allows for integration with backlight, where adjusting the transfer function in combination with the backlight works to define a uniform representation at the output medium.

It also makes for a reasonably consistent technical blending / compositing approach.

This virtual display encoding then can be handed downstream to a GPU pipeline for redistribution and encoding to the various output mediums. It would also allow for the window manager side to permit an audience member to override some erroneous hardware reporting or what not.

is exactly about taking an arbitrary pipeline assembled with LittleCMS, optimizing that, and realizing it through OpenGL.

I'd have a bit of concern here as ICCv4 was a bit of a misstep from everything I have seen. The media white point issue comes to mind as a serious problem for display handling. Some aspects will be useful, but others likely will have issues with folks using any ICCv4 work. The ever wise Graeme Gill has deeper insights here, and should be considered authoritative on the subject.

I'd lean toward a custom library, with full GPU support, that borrows only what is absolutely required from LCMS. Sort of a Weston variation of ColorSync that can be modified and adapted as issues happen.

I don't really see what the issue here is. Except maybe content tagged with a wrong ICC profile (i.e. the sRGB two-piece function when it should be the sRGB pure power-law 2.2) but not sure what we can do about content with bad metadata.

You'll have to re-evaluate what I posted above. There's a discrepancy in the implicit chain that cannot be adequately expressed in the explicit chain. There's no way to properly form the picture from a BT.709 OETF encoded origin encoding, for example. Which is why the EETF is required.

The perceptual transform is specifically designed to adjust from one viewing context to a connection viewing context.

I do not believe this is correct.

Rendering intents, specifically the Perceptual and Saturation intents, are up to the implementation to define. They are defined for medium related transformations, with the viewing context fixed in the specification.

Worse, they are optional, which simply will not suffice for a management system that specifically seeks to negotiate the problems listed here.

In the end, the rendering intents in the ICC protocol are oriented around graphic design. They were overloaded to work decently with the ICCv2 system, but the ICCv4 changes seemed to move toward more graphic design specific chains. The mwpt issue in version 4 is very real, and problematic for display mediums.

It also allows for integration with backlight, where adjusting the transfer function in combination with the backlight works to define a uniform representation at the output medium.

How does a "virtual" encoding help here compared to the output encoding?

It also makes for a reasonably consistent technical blending / compositing approach.

Again, why would that be any different than the output encoding?

This virtual display encoding then can be handed downstream to a GPU pipeline for redistribution and encoding to the various output mediums.

Why would we want this extra step of encoding from the virtual encoding to the output encoding?

You'll have to re-evaluate what I posted above. There's a discrepancy in the implicit chain that cannot be adequately expressed in the explicit chain. There's no way to properly form the picture from a BT.709 OETF encoded origin encoding, for example. Which is why the EETF is required.

Nobody is arguing that you can represent the implicit chain in in an ICC profile (even though it might be possible with iccMAX, but really, not the point) but that a CMM can handle both the implicit chain and ICC profiles at the same time.

They are defined for medium related transformations, with the viewing context fixed in the specification.

Yeah, a profile's perceptual transformation is from the source viewing context to this fixed viewing context defined the the specification. Now the CMM takes the output profile's perceptual transform from this fixed viewing context and transforms it to it's own destination viewing context. A complete perceptual transform is transforming from the viewing context of the source profile to the viewing context of the destination profile.

And as alluded to before, the CMM can create a perceptual transform (either from the PCS to a profile or from profile to profile) on its own from the profile's colorimetric transform.

Yeah, a profile's perceptual transformation is from the source viewing context to this fixed viewing context defined the the specification. Now the CMM takes the output profile's perceptual transform from this fixed viewing context and transforms it to it's own destination viewing context. A complete perceptual transform is transforming from the viewing context of the source profile to the viewing context of the destination profile.

It’s really not designed for surround viewing contexts. This is a complete conflation with what it is designed to achieve, which is medium dependent gamut mapping.

Anyways… have fun.

I'm pretty sure that the brightest part of a scene is not what HVS adapt.

I'd suggest that it would be prudent to read the research. The interactions of surface brightness and lightness are complex. Anchoring clearly plays some role, as does adjacent surface definitions. Needless to say, the research is still ongoing here, but the idea that our vision would remain fixed with higher luminance elements in the field of view is not easy to verify the veracity of.

I did read Gilchrist 1999 just now, thanks for the recommendation.

I would say that I was right. The most luminous part isn't necessarily what the HVS anchors white to, it's rather a function of luminance and area and that's why the EDR/reference/diffuse white concept should work well. Luminance over the diffuse white should be for highlights so the most luminous area will be small and the HVS will anchors white mostly to the lower than diffuse white because it has the biggest area.

surround viewing contexts

What's that?

What I see though is the need for cross-desktop compositing. We can envision a dual head mode where one display is some variant of BT.2100 with perhaps an ST.2084 EOTF, and another where it is a generic 2.2 EOTF. The question then becomes one of efficiency of encoding intermediate virtual display?

Troy, are you perhaps assuming that there must be a single desktop image (buffer) covering all monitors?

That is simply not true with the implementations we are working on. Each monitor is composited separately, independently, and with their own color transformations. There is no common color space anywhere in the pixel path that would be shared by all monitors. Every window gets is own individually tailored color transformation for each monitor it is visible on.

If some window is tagged for a specific monitor directly (implying the application and the end user care about the exact color transformation so much that they want to do it themselves), but it is shown on a different monitor, then the compositing system will create a color transformation from the specified monitor to the other monitor. Of course, this conversion may not be of great quality, but the assumption is that the one monitor is the important one, and the other just has to be in the right ballpark to not be too disturbing. This is just because we do not have a Wayland interface to deliver a window's image per each monitor separately (not that adding one would be hard, it just hasn't been sufficiently justified yet).

I'd have a bit of concern here as ICCv4 was a bit of a misstep from everything I have seen.

The intention is not to do things like convert BT.709 spec into an ICCv4 file. We just use the (low-level) color processing pipeline facilities offered by LittleCMS to have a common representation for also custom and arbitrary color transformation elements. If there actually is a real ICC file attached to some content or a monitor, the CMM makes it easier to plug that in.

I would say that I was right.

Narrator Voice: He was not right.

Side note... I perused some of the other discussions and noticed folks that appeared to be getting lost in the weeds of "linear" versus "nonlinear" versus "curved" etc.

I would lay down a blanket carpet technique that might help elucidate things for folks who are entering into these discussions. When we discuss the metrics of colour, we are frequently discussing colourimetry with respect to one of the CIE colourimetric standards, the most common being the CIE 1931 2 Degree Standard Observer formulation.

When we speak of colourimetry, it can be helpful to identify it as tristimulus as per the CIE definitions.

In the interest of clarity, all tristimulus is always uniform with respect to tristimulus. There is no such thing as "nonlinear" or "pre curve" etc. Either values are tristimulus, or they are something else, such as an encoded electrical signal.

Keeping a clear handle on whether we are dealing with a tristimulus value or not is a first step to being able to untangle the many layers of other complexities, such as appearance related issues. It helps to prevent folks from wandering off the path into "nonlinear" or "pre curve" or other confusing nonsense.

Should we replace "optical values" and "linear values" with "tristimulus values" across the board?

And "encoded values" for what we now call "non-linear values"?

Does the definition of tristimulus mean that it is physically correct according to the coverage based alpha blending model to compute { alpha, 1 - alpha } weighted average of two tristimulus tuples?

Btw. "pre-curve" and "post-curve" are implementation details of a generic pipeline, and have no other semantic meaning than it's a curve (e.g. a 1D LUT per channel) that is present in a pipeline of a certain structure before or after something else.

These have nothing to do with e.g. EOTF. However, if you want to build a pipeline to realize some operation that has a semantic meaning, like "apply EOTF", then you could load the EOTF into either pre-curve or post-curve operational block, whichever suits the intention.

Does the definition of tristimulus mean that it is physically correct according to the coverage based alpha blending model to compute { alpha, 1 - alpha } weighted average of two tristimulus tuples?

Uniform tristimulus will mix "correctly" if the goal is something akin to a radiometric mixture, given that uniform tristimulus is the metric that is closest to radiometric measurements.

Uniform tristimulus is not the proper domain for all manipulations, however. For example, with downsampling / downscaling it is the closest approximation to correctness, but for reconstructive "upsampling" / upscaling it is incorrect. The latter also applies to reconstructive aspects such as antialiasing, glyph rendering, etc. It is important to sample the signal in an appropriate domain projection to achieve the desired goal. Thar be nuance in them thar hills...

Keeping a clear eye on the fact that tristimulus colourimetric values are not light is rather important, and can help to insulate against errors of logic.

And "encoded values" for what we now call "non-linear values"?

It would seem so?

My issue with "non linear" is that folks tend to accept that frame of reference implicitly, as opposed to evaluating the frame of reference the information is currently in. EG: Tristimulus is always uniform with respect to other tristimulus. This is a simple fact, and leads to a logical idea that something such as a "nonlinear tristimulus" exists. It does not. We should consider those transformations for what they are; a wholly transformative distortion into a different state. If it is tristimulus, it must, by definition, be uniform tristimulus. If the data does not represent uniform tristimulus, it is something other than tristimulus.

Worse, in this specific case, there are a hundred different reasons that one may have a nonuniform transformation applied to tristimulus values, and none of them are necessarily equivalent as the singular idea of "non linear" might indicate. EG: Are we distorting the tristimulus nonuniformly into a lightness / brightness projection? Is it for inverse EOTF encoding to display code values? Etc.

I caution against treating these sorts of details as pedantry, despite my complete understanding that it may appear as such. Anecdotally, I would suggest that many higher level logical problems often arise from erroneous inferences largely anchored around subtle language and terms.

Thanks guys, very interesting reading. I can add only two cents that GPU display hardware for CM or other purpose is 1000 time more efficient from power saving perspective vs to use GPU shaders.

It is a general rule that fixed hardware is 1000 times more power effective than nonfixed.

Fair. Too bad hardware often screws up encodings / decodings, and worse, cannot account for all of the permutations required in this.

If you have tristimulus values, and then you encode them, is the result not "encoded tristimulus"?

In other words, "non-linear tristimulus", or using even further (away) terms "non-linear linear values". The last phrasing certainly sounds nonsense, so it's easy to avoid even if one is less knowledgeable. Understanding that "non-linear tristimulus" is a contradiction in itself requires more learning.

I think the fundamental problem in general is that most terms we use are generic names. A curve, a matrix, an encoding, EOTF, OETF, OOTF, non-linearity; these are all generic terms and do not inherently imply which curve, what matrix elements, and so on. "Tristimulus" is the same, isn't it? IIUC, you can multiply some tristimulus with an arbitrary invertible 3x3 matrix and the result is still tristimulus - just in a different trichromatic system.

Therefore I would argue that using the term "tristimulus" does not provide any improved insight or intuitivity until one has learnt what that term means. The same could be said about other terms, except sometimes a contradiction can be obvious even to the uneducated like with "non-linear linear values".

On the other hand I recognize the usefulness and inclusivity of using established terminology of the field, so I can well embrace "tristimulus" if I am sufficiently confident in using it right.

FWIW, I have been arguing for Linux kernel UAPI (and even towards hardware implementations) that userspace (compositors) wants to off-load specific mathematical operations (e.g. multiply by this matrix), and not color spaces (e.g. "please convert BT.709 SDR to HLG").

I'm happy to say that the message has been acknowledged.

Really interesting conversation here and in the thread above about sRGB and BT.709. I can't claim I follow it all but what this does is reinforce my understanding that color management is complex, often involving problems that haven't been solved or problems that can never be fully "solved." Hence, we need a system that is flexible to allow us to get through iterations of implementations and allows for experimentation.

Fair. Too bad hardware often screws up encodings / decodings, and worse, cannot account for all of the permutations required in this.

IMO HW (at least AMD HW) is quite flexible and capable but APIs are often inflexible or push decisions around content transforms down to (often closed-source) drivers.

APIs are hard things to change. Creating APIs around mathematical (straight-forward) operations, as suggested by @pq, will (hopefully) avoid some of these problems and allow different compositors to experiment and make different design decisions. And maybe those can help us better understand what works and what doesn't.

Just thought I'd notify the folks in this thread that there is a very relevant conference happening that is available via online streaming. Of specific relevance, the advanced expertise of Dr. Stockman is presenting on higher order lightness perception.

For those who are unfamiliar with Dr. Stockman's work, he is responsible for the contemporary CIE revisions to the Standard Observer, in conjunction with Dr. Sharpe. A foremost expert in luminous efficacy, brightness, and lightness.

Thanks Troy!

I watched the presentation and while it was very interesting, I cannot see where or how to apply that.

mentioned in merge request wayland/weston!906 (merged)

mentioned in merge request swick/wayland-protocols!19 (merged)

I guess Khronos also got sRGB characteristic functions wrong in https://registry.khronos.org/DataFormat/specs/1.3/dataformat.1.3.html#TRANSFER_SRGB ?

Very much looks like it. Opened an issue: https://github.com/KhronosGroup/DataFormat/issues/19

Vulkan spec

If the image view format is sRGB, the color components are first converted as if they are UNORM, and then sRGB to linear conversion is applied to the R, G, and B components as described in the “sRGB EOTF” section of the Khronos Data Format Specification. The A component, if present, is unchanged.

They probably mean the sRGB OETF^-1 here?

If they are referring to the incorrect assumption that the two part encoding is the EOTF, they likely would be referring to the OETF / OECF.

It’s a shame that the implicit chain of appearance-based encoding through decoding were not more well understood.

Right, just noticed how my comment is ambiguous. They should be referring to the sRGB two part inverse OETF for decoding in shader read access and the OETF for encoding in shader write access.

Currently they reference the EOTF but they also got the definition of the EOTF wrong so it does the right thing by accident but it's still wrong.

FYI, !19 (merged) attempts to make it slightly better.

That "well-known" page still needs an overhaul if not deletion.

@sobotka @pq @

https://github.com/KhronosGroup/DataFormat/issues/19#issuecomment-1371443733

I think you managed to convince me that the EOTF is not a pure 2.2 power function. The spec really never talks about an OETF. The pure 2.2 gamma of the reference display and the actual two-part encoding do not have an intentional mismatch. The reference display just has pure gamma because it's a CRTC and that's how they operate and the EOTF is as close to the pure gamma as possible while also making it possible to actually encode it. The intentional mismatch the spec is talking about seems to be the one between BT.709 OETF and the sRGB EOTF.

Read the specification closely and historicize it.

There is no way to reconcile what is clearly stated in the specification. Not just clearly, clearly and repeated with specific regard to reference display characteristics.

EOTF

Designs ...

Child items ...

Activity

Admin message

Admin message

EOTF

Activity