Fix wrong sign in lower_rcp
The nested fma calls were supposed to implement
x_new = x + x * (1 - x*src),
but instead current code is equivalent to
x_new = x - x * (1 - x*src).
The result is that Newton-Raphson steps don't improve precision at all. This patch fixes this problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435