Commit a67bc178 authored by Connor Abbott's avatar Connor Abbott
Browse files

bifrost: Describe atan

parent 0db124fa
......@@ -985,4 +985,38 @@ p2 = FMA.f32 a2, p2, -0
p3 = FMA.f32 a2, p2, a2
x = FMA_MSCALE.clamp_0_inf p3, a1t, a1t, a1i
x' = MAX.f32.nan_wins x, a
== Arc-tangent (G71 and later)
All the inverse trignonometric functions are computed using the two-argument arc-tangent function (sometimes called `atan2`, although it is just an overload of `atan` in GLSL) which determines the angle of a given input 2D vector. In particular, latexmath:[\arctan(y) = \arctan(y, 1.0)], latexmath:[\arcsin(y) = \arctan(y, \sqrt{1 - y^2})] and latexmath:[\arccos(x) = \arctan(\sqrt{1 - x^2}, x)].
The first thing to note is that we can restrict the problem to the upper-left quadrant using symmetry. We calculate latexmath:[\arctan(|y|, |x|)] first. Taking the branch cut along the negative x-axis into account, one can check that in order to calculate latexmath:[\arctan(y, x)], it is sufficient to subtract latexmath:[\pi] if _x_ is negative, and then change the sign of the result to equal the sign of _y_ (effectively negating the result if it has differing sign from _y_). The blob is careful to use integer instructions to determine the sign of _x_ and _y_, so that the appropriate sign is returned even if one is zero, and latexmath:[\arctan(-1, +0) = \pi] while latexmath:[\arctan(-1, -0) = -\pi].
In order to actually calculate the arc-tangent, we use a novel reduction step in addition to the usual polynomial approximation. Before dividing _y_ by _x_, we apply a linear transform latexmath:[R = \begin{bmatrix} 1 & a \\ -a & 1 \end{bmatrix}] to the original vector latexmath:[\mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}]. This matrix has orthogonal columns and positive determinant latexmath:[1+a^2], so it is a rotation (it also scales the input, but since the final division will remove any scale we add, this is irrelevant). In fact, since it sends the basis vector latexmath:[\hat{x}] to latexmath:[\begin{bmatrix} 1 \\ -a \end{bmatrix}], it is a rotation by an angle of latexmath:[-\arctan(a)]. So, we have that latexmath:[\arctan(\mathbf{x}) = \arctan(R \mathbf{x}) + \arctan(a)], where we are free to choose _a_. If latexmath:[a = y/x], then latexmath:[\arctan(R \mathbf{x}) = 0] since the y-component of latexmath:[R \mathbf{x}] will be 0. We choose _a_ to be a coarse-grained approximation of latexmath:[y/x], so that we can lookup latexmath:[\arctan(a)] in a table yet the range of latexmath:[\arctan(R \mathbf{x})] is small enough that it can be approximated using a polynomial with relatively few terms after dividing the transformed _y_ by _x_.
There are a few special cases to deal with. When the coordinates are too large or too small, the calculation of latexmath:[R\mathbf{x}] may overflow or underflow, in which case a scale has to be applied to _x_ and _y_ by adding to the exponent. Also, we have to handle input infinities correctly. All of these special cases are handled using the `ATAN_ASSIST`, `ATAN_LDEXP.X` and `ATAN_LDEXP.Y` instructions. `ATAN_ASSIST` returns the appropriate scale as a 16-bit signed integer in the low 16 bits of the result, and _a_ as a half-precision floating point number in the high 16 bits. The result of `ATAN_ASSIST` is passed directly to `ATAN_LDEXP.X` and `ATAN_LDEXP.Y` as the scale argument, as well as the original _x_ and _y_ coordinates respectively. The `ATAN_LDEXP` instructions ignore the high 16 bits of the scale, since that contains _a_. They otherwise operate similarly to the normal `ldexp` C function, except that if passed 0x8000 (-2^16^) as a scale, they will turn infinity into 1.0. `ATAN_ASSIST` passes this special scale if either _x_ or _y_ are infinity, so that the infinite coordinate becomes 1.0, while any non-infinite coordinate becomes 0.0 due to underflow. This produces the expected result and avoids infinities permeating the rest of the code. Then, latexmath:[R \mathbf{x}] is computed using two fused multiply-adds, using a modifiers to expand the high 16 bits of the `ATAN_ASSIST` instruction to 32 bits while also taking the absolute value of _x_ and _y_. Finally, _y_ is divided by _x_, the polynomial approximation to latexmath:[\arctan(R \mathbf{x})] is computed, latexmath:[\arctan(a)] is computed using `ATAN_TABLE` and is added, and the corrections for different quadrants are applied.
a = ATAN_ASSIST y, x
y' = ATAN_LDEXP.Y.f32 y, a
x' = ATAN_LDEXP.X.f32 x, a
x'' = FMA.f32 {R7, T0}, abs(y'), R4.y, abs(x')
xm = FRCP_FREXPM x''
y'' = FMA.f32 {R4, T0}, -R4.y, abs(x'), abs(y')
xi1 = FRCP_TABLE x''
err = FMA_MSCALE.rcp_mode T0, xm, -xi1, 1.0, 0
xi2 = FMA.f32 err, x'', x''
xe = FRCP_FREXPE x''
y_over_x = FMA_MSCALE y'', xi2, -0, xe
y_over_x_sq = FMA.f32 y_over_x, y_over_x, -0
atan_a = ATAN_TABLE y, x
p1 = FMA.f32 y_over_x_sq, y_over_x_sq, 0x3e4b2a00 /* 0.198402 */, 0xbeaaaaab /* -0.333333 */
p2 = FMA.f32 y_over_x_sq, p1, 1.0
corr = CSEL.UGE.i32 x, 0x80000000, -pi, 0
atan_a = ADD.f32 atan_a, corr
atan_all = FMA_MSCALE p2, y_over_x, atan_a, 0
result = MUX y, atan_all, 0x80000000 # copysign(y, atan_all)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment