Commit 70e70db8 authored by Connor Abbott's avatar Connor Abbott
Browse files

bifrost: Add initial log description

Still need to do cores after G71.
parent d1665447
...@@ -727,7 +727,7 @@ x = FMA_MSCALE t1, x1, x1, e' ...@@ -727,7 +727,7 @@ x = FMA_MSCALE t1, x1, x1, e'
== Reciprocal (later versions) == Reciprocal (later versions)
On every Bifrost core released since G71, there is a new `FRCP_FAST` instruction that does all of the above in one instruction. Since there are two dependent fused multiply-adds, this can't feasibly be done in one cycle. Instead, even though `RCP_FAST` is encoded as an ADD instruction, it actually uses both FMA and ADD units, and the corresponding FMA instruction before it must be a nop. On every Bifrost core released after G71, there is a new `FRCP_FAST` instruction that does all of the above in one instruction. Since there are two dependent fused multiply-adds, this can't feasibly be done in one cycle. Instead, even though `RCP_FAST` is encoded as an ADD instruction, it actually uses both FMA and ADD units, and the corresponding FMA instruction before it must be a nop.
== Reciprocal square root (G71) == Reciprocal square root (G71)
...@@ -775,3 +775,44 @@ r1 = FRSQ_FAST a1 ...@@ -775,3 +775,44 @@ r1 = FRSQ_FAST a1
output = FMA_MSCALE.sqrt_mode a1, r1, -0, e output = FMA_MSCALE.sqrt_mode a1, r1, -0, e
---- ----
== Natural & Base-2 Log (G71)
As described in the patent, the argument reduction step for log depends on the fact that `log(a * b) = log(a) + log(b)`. First, we reduce the input `a` to the range [0.75, 1.5) and calculate the required fixup exponent using the `FLOG_FREPXE` instruction. This fixup exponent is converted to a floating-point number, and is then added to the result later (after multiplying by ln(2) for base-e log), using the fact that `log(m * 2^e) = e + log(m)`. We lookup an very coarse-grained reciprocal estimate `r1` to the reduced input `a1` in a table using the `FRCP_APPROX` instruction, and we also lookup `-log(r1)` using another table that stores for each input `x`, `log(FRCP_APPROX(x))`. Both these instructions take `a`, and do the reduction to [0.75, 1.5) implicitly. Since `log(a1) = log(a1 * r1) - log(r1)`, all we need to do to compute `log(a1)` is compute `log(a1 * r1)`. Since `a1 * r1` is close to 1 by design, we can do this with a polynomial approximation of `log(y + 1)` with only a few terms. The approximation chosen for base _e_ is `log(y + 1) = y * (1.0 + y * (a + y * b))` where the precise constants used are `b = 0x3eab3200 (0.33436)` and `a = 0xbf0003f0 (-0.500060)`, close to the Taylor series terms 1/3 and -1/2 but not exactly the same. The base 2 logarithm computes `y' = y * c` where `c` is a single-precision approximation to `1/ln(2)` and then `log2(y + 1) = y' + y * (c + y' * (y * a + b))` where `d = 0x32a57060 (1.29249 * 2^{-26})`. `d` holds the error due to approximating `1/ln(2)` as a single-precision floating point number, so that `c + d`, when evaluated at infinite precision, is a much better approximation to `1/ln(2)` than `c` by itself.
base 2 logarithm:
[source]
----
a1 = LOG_FREXPM a
r1 = FRCP_APPROX a
xt = FLOG2_TABLE a
e = FLOG_FREXPE a
e = I32_TO_F32 e
x1 = ADD.f32 T1, e, xt
y = FMA.f32 a1, r1, -1.0
y' = FMA.f32 y, 0x3fb8aa3b /* c */, -0
t1 = FMA.f32 y, 0x3eab3200 /* b */, 0xbf0003f0 /* a */
t2 = FMA.f32 y', t1, 0x32a57060 /* d */
x2 = FMA.f32 T0, y, t2, y'
output = ADD.f32 x1, x2
----
base e logarithm:
[source]
----
a1 = LOG_FREXPM a
r1 = FRCP_APPROX a
xt = FLOGE_TABLE a
e = FLOG_FREXPE a
e = I32_TO_F32 e
x1 = FMA.f32 T0, e, 0x3f317218 /* ln(2) */, xt
y = FMA.f32 a1, r1, -1.0
t1 = FMA.f32 y, 0x3eab3200 /* b */, 0xbf0003f0 /* a */
t2 = FMA.f32 y, t1, 1.0
output = FMA.f32 {R0, T0}, y, t2, x1
----
== Natural & Base-2 Logarithm (later)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment