Commit a67bc178 by Connor Abbott

### bifrost: Describe atan

parent 0db124fa
 ... ... @@ -985,4 +985,38 @@ p2 = FMA.f32 a2, p2, -0 p3 = FMA.f32 a2, p2, a2 x = FMA_MSCALE.clamp_0_inf p3, a1t, a1t, a1i x' = MAX.f32.nan_wins x, a ---- == Arc-tangent (G71 and later) All the inverse trignonometric functions are computed using the two-argument arc-tangent function (sometimes called atan2, although it is just an overload of atan in GLSL) which determines the angle of a given input 2D vector. In particular, latexmath:[\arctan(y) = \arctan(y, 1.0)], latexmath:[\arcsin(y) = \arctan(y, \sqrt{1 - y^2})] and latexmath:[\arccos(x) = \arctan(\sqrt{1 - x^2}, x)]. The first thing to note is that we can restrict the problem to the upper-left quadrant using symmetry. We calculate latexmath:[\arctan(|y|, |x|)] first. Taking the branch cut along the negative x-axis into account, one can check that in order to calculate latexmath:[\arctan(y, x)], it is sufficient to subtract latexmath:[\pi] if _x_ is negative, and then change the sign of the result to equal the sign of _y_ (effectively negating the result if it has differing sign from _y_). The blob is careful to use integer instructions to determine the sign of _x_ and _y_, so that the appropriate sign is returned even if one is zero, and latexmath:[\arctan(-1, +0) = \pi] while latexmath:[\arctan(-1, -0) = -\pi]. In order to actually calculate the arc-tangent, we use a novel reduction step in addition to the usual polynomial approximation. Before dividing _y_ by _x_, we apply a linear transform latexmath:[R = \begin{bmatrix} 1 & a \\ -a & 1 \end{bmatrix}] to the original vector latexmath:[\mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}]. This matrix has orthogonal columns and positive determinant latexmath:[1+a^2], so it is a rotation (it also scales the input, but since the final division will remove any scale we add, this is irrelevant). In fact, since it sends the basis vector latexmath:[\hat{x}] to latexmath:[\begin{bmatrix} 1 \\ -a \end{bmatrix}], it is a rotation by an angle of latexmath:[-\arctan(a)]. So, we have that latexmath:[\arctan(\mathbf{x}) = \arctan(R \mathbf{x}) + \arctan(a)], where we are free to choose _a_. If latexmath:[a = y/x], then latexmath:[\arctan(R \mathbf{x}) = 0] since the y-component of latexmath:[R \mathbf{x}] will be 0. We choose _a_ to be a coarse-grained approximation of latexmath:[y/x], so that we can lookup latexmath:[\arctan(a)] in a table yet the range of latexmath:[\arctan(R \mathbf{x})] is small enough that it can be approximated using a polynomial with relatively few terms after dividing the transformed _y_ by _x_. There are a few special cases to deal with. When the coordinates are too large or too small, the calculation of latexmath:[R\mathbf{x}] may overflow or underflow, in which case a scale has to be applied to _x_ and _y_ by adding to the exponent. Also, we have to handle input infinities correctly. All of these special cases are handled using the ATAN_ASSIST, ATAN_LDEXP.X and ATAN_LDEXP.Y instructions. ATAN_ASSIST returns the appropriate scale as a 16-bit signed integer in the low 16 bits of the result, and _a_ as a half-precision floating point number in the high 16 bits. The result of ATAN_ASSIST is passed directly to ATAN_LDEXP.X and ATAN_LDEXP.Y as the scale argument, as well as the original _x_ and _y_ coordinates respectively. The ATAN_LDEXP instructions ignore the high 16 bits of the scale, since that contains _a_. They otherwise operate similarly to the normal ldexp C function, except that if passed 0x8000 (-2^16^) as a scale, they will turn infinity into 1.0. ATAN_ASSIST passes this special scale if either _x_ or _y_ are infinity, so that the infinite coordinate becomes 1.0, while any non-infinite coordinate becomes 0.0 due to underflow. This produces the expected result and avoids infinities permeating the rest of the code. Then, latexmath:[R \mathbf{x}] is computed using two fused multiply-adds, using a modifiers to expand the high 16 bits of the ATAN_ASSIST instruction to 32 bits while also taking the absolute value of _x_ and _y_. Finally, _y_ is divided by _x_, the polynomial approximation to latexmath:[\arctan(R \mathbf{x})] is computed, latexmath:[\arctan(a)] is computed using ATAN_TABLE and is added, and the corrections for different quadrants are applied. atan2: [source] a = ATAN_ASSIST y, x y' = ATAN_LDEXP.Y.f32 y, a x' = ATAN_LDEXP.X.f32 x, a x'' = FMA.f32 {R7, T0}, abs(y'), R4.y, abs(x') xm = FRCP_FREXPM x'' y'' = FMA.f32 {R4, T0}, -R4.y, abs(x'), abs(y') xi1 = FRCP_TABLE x'' err = FMA_MSCALE.rcp_mode T0, xm, -xi1, 1.0, 0 xi2 = FMA.f32 err, x'', x'' xe = FRCP_FREXPE x'' y_over_x = FMA_MSCALE y'', xi2, -0, xe y_over_x_sq = FMA.f32 y_over_x, y_over_x, -0 atan_a = ATAN_TABLE y, x p1 = FMA.f32 y_over_x_sq, y_over_x_sq, 0x3e4b2a00 /* 0.198402 */, 0xbeaaaaab /* -0.333333 */ p2 = FMA.f32 y_over_x_sq, p1, 1.0 corr = CSEL.UGE.i32 x, 0x80000000, -pi, 0 atan_a = ADD.f32 atan_a, corr atan_all = FMA_MSCALE p2, y_over_x, atan_a, 0 result = MUX y, atan_all, 0x80000000 # copysign(y, atan_all) ---- \ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!