@@ -702,3 +702,23 @@ The interpolation field has the following meaning:

|============================

If the coordinates computed would be the same as the previous interpolation instruction, the "reuse previous coordinates" bit can be set. Finally, the "flat shading" bit enables flat shading, where the varying is chosen from one of the triangle corners based on the GL provoking vertex rules.

= Special functions

Rather than having a dedicated unit for more complicated floating-point operations, Bifrost reuses the fixed-latency general-purpose FMA/ADD pipeline, meaning it uses regular fixed-latency instructions. Since most of these operations have a latency much longer than most FMA and ADD instructions, they have to be broken up into a number of simpler steps. This section explains each of the algorithms and special instructions used to implement each complex builtin function in GLSL.

== Reciprocal

The reciprocal implementation uses an initial table lookup followed by a single step of https://en.wikipedia.org/wiki/Newton%27s_method#Multiplicative_inverses_of_numbers_and_power_series[Newton's method].

Given the input `a`, first, we do an argument reduction step, computing a reduced `a1` in the range [0.5, 1.0) and an initial approximation `x1` of `1/a1`. Computing `a1` is done with the `FRCP_FREXPM` instruction which is the same as the mantissa part of `frexp` in C. Except for special cases, it just sets the exponent to -1. `x1` is looked up in a table based on the top 17 bits of the mantissa of `a` (the mantissa of `a` is always the same as the mantissa of `a1`) using the `FRCP_TABLE` instruction. Next, we compute `x2 = x1 * (2 - a1 * x1)` and do the final exponent adjustment to compensate for the argument reduction. `x2` is actually computed as `x1 + x1 * (1 - a1 * x1)`, using two fused multiply-adds, to avoid rounding errors. The computation of `1 - a1 * x1` is particularly sensitive, since `a1 * x1` is very close to one by design. The second fused multiply-add and exponent adjustment are done in the same instruction, using a special `FMA_MSCALE` instruction which takes a fourth argument that is added to exponent of the result. There is an `FRCP_FREXPE` instruction which computes the exponent that `frexp` would have, and then negates it, to get the right bias for the final exponent adjustment. Finally, there is one final complication, which is that if `a` is infinity, then `a1` will be infinity and `x1` will be zero, and multiplying them will give NaN according to IEEE arithmetic, which will propagate to the final result instead of returning the appropriately-signed zero as expected. Also, internal details of the sign of `+/-NaN * +/-NaN` mean that the returned NaN will have the wrong sign when computing 1/NaN. To fix these issues with corner cases, ARM uses a special `rcp_mode` modifier which only exists on the aforementioned `FMA_MSCALE` instruction for the first FMA. The exponent bias is set to 0, so it is otherwise identical to a normal `FMA` instruction.