Editing Automatic differentiation (section)

== Automatic differentiation using dual numbers ==

Forward mode automatic differentiation is accomplished by augmenting the [[Algebra over a field|algebra]] of [[real numbers]] and obtaining a new [[arithmetic]]. An additional component is added to every number to represent the derivative of a function at the number, and all arithmetic operators are extended for the augmented algebra. The augmented algebra is the algebra of [[dual numbers]].

Replace every number <math>\,x</math> with the number <math>x + x'\varepsilon</math>, where <math>x'</math> is a real number, but <math>\varepsilon</math> is an [[abstract number]] with the property <math>\varepsilon^2=0</math> (an [[infinitesimal]]; see ''[[Smooth infinitesimal analysis]]''). Using only this, regular arithmetic gives
<math display="block">\begin{align}
      (x + x'\varepsilon) + (y + y'\varepsilon) &= x + y + (x' + y')\varepsilon \\
      (x + x'\varepsilon) - (y + y'\varepsilon) &= x - y + (x' - y')\varepsilon \\
  (x + x'\varepsilon) \cdot (y + y'\varepsilon) &= xy + xy'\varepsilon + yx'\varepsilon + x'y'\varepsilon^2 = xy + (x y' + yx')\varepsilon \\
  (x + x'\varepsilon) / (y + y'\varepsilon) &= (x/y + x'\varepsilon/y) / (1 + y'\varepsilon/y) = (x/y + x'\varepsilon/y) \cdot (1 - y'\varepsilon/y) = x/y + (x'/y - xy'/y^2)\varepsilon
\end{align}</math>
using <math>(1 + y'\varepsilon/y) \cdot (1 - y'\varepsilon/y) = 1</math>.

Now, [[polynomials]] can be calculated in this augmented arithmetic. If <math>P(x) = p_0 + p_1 x + p_2x^2 + \cdots + p_n x^n</math>, then
<math display="block">\begin{align}
  P(x + x'\varepsilon) &= p_0 + p_1(x + x'\varepsilon) + \cdots + p_n (x + x'\varepsilon)^n \\
                       &= p_0 + p_1 x + \cdots + p_n x^n + p_1x'\varepsilon + 2p_2xx'\varepsilon + \cdots + np_n x^{n-1} x'\varepsilon \\
                       &= P(x) + P^{(1)}(x)x'\varepsilon
\end{align}</math>
where <math>P^{(1)}</math> denotes the derivative of <math>P</math> with respect to its first argument, and <math>x'</math>, called a ''seed'', can be chosen arbitrarily.

The new arithmetic consists of [[ordered pair]]s, elements written <math>\langle x, x' \rangle</math>, with ordinary arithmetics on the first component, and first order differentiation arithmetic on the second component, as described above. Extending the above results on polynomials to [[analytic functions]] gives a list of the basic arithmetic and some standard functions for the new arithmetic:
<math display="block">\begin{align}
  \left\langle u,u'\right\rangle + \left\langle v,v'\right\rangle &= \left\langle u + v, u' + v' \right\rangle \\
  \left\langle u,u'\right\rangle - \left\langle v,v'\right\rangle &= \left\langle u - v, u' - v' \right\rangle \\
  \left\langle u,u'\right\rangle * \left\langle v,v'\right\rangle &= \left\langle u v, u'v + uv' \right\rangle \\
  \left\langle u,u'\right\rangle / \left\langle v,v'\right\rangle &= \left\langle \frac{u}{v}, \frac{u'v - uv'}{v^2} \right\rangle \quad ( v\ne 0) \\
                               \sin\left\langle u,u'\right\rangle &= \left\langle \sin(u) , u' \cos(u) \right\rangle \\
                               \cos\left\langle u,u'\right\rangle &= \left\langle \cos(u) , -u' \sin(u) \right\rangle \\
                               \exp\left\langle u,u'\right\rangle &= \left\langle \exp u , u' \exp u \right\rangle \\
                               \log\left\langle u,u'\right\rangle &= \left\langle \log(u) , u'/u \right\rangle \quad (u>0) \\
                                 \left\langle u,u'\right\rangle^k &= \left\langle u^k , u' k u^{k - 1} \right\rangle \quad (u \ne 0) \\
                    \left| \left\langle u,u'\right\rangle \right| &= \left\langle \left| u \right| , u' \operatorname{sign} u \right\rangle \quad (u \ne 0)
\end{align}</math>
and in general for the primitive function <math>g</math>,
<math display="block">g(\langle u,u' \rangle , \langle v,v' \rangle ) = \langle g(u,v) , g_u(u,v) u' + g_v(u,v) v' \rangle</math>
where <math>g_u</math> and <math>g_v</math> are the derivatives of <math>g</math> with respect to its first and second arguments, respectively.

When a binary basic arithmetic operation is applied to mixed arguments—the pair <math>\langle u, u' \rangle</math> and the real number <math>c</math>—the real number is first lifted to <math>\langle c, 0 \rangle</math>. The derivative of a function <math>f : \R\to\R</math> at the point <math>x_0</math> is now found by calculating <math>f(\langle x_0, 1 \rangle)</math> using the above arithmetic, which gives <math>\langle f ( x_0 ) , f' ( x_0 ) \rangle </math> as the result.

=== Implementation ===
An example implementation based on the dual number approach follows.

==== Pseudo code ====
{{pre|1=
Dual plus(Dual A, Dual B) {
  return {
    realPartOf(A) + realPartOf(B),
    infinitesimalPartOf(A) + infinitesimalPartOf(B)
  };
}
Dual minus(Dual A, Dual B) {
  return {
    realPartOf(A) - realPartOf(B),
    infinitesimalPartOf(A) - infinitesimalPartOf(B)
  };
}
Dual multiply(Dual A, Dual B) {
  return {
    realPartOf(A) * realPartOf(B),
    realPartOf(B) * infinitesimalPartOf(A) + realPartOf(A) * infinitesimalPartOf(B)
  };
}
X = {x, 0};
Y = {y, 0};
Epsilon = {0, 1};
xPartial = infinitesimalPartOf(f(X + Epsilon, Y));
yPartial = infinitesimalPartOf(f(X, Y + Epsilon));
}}

==== C++ ====
<syntaxhighlight lang="cpp">
#include <iostream>
struct Dual {
   float realPart, infinitesimalPart;
   Dual(float realPart, float infinitesimalPart=0): realPart(realPart), infinitesimalPart(infinitesimalPart) {}
   Dual operator+(Dual other) {
      return Dual(
         realPart + other.realPart,
         infinitesimalPart + other.infinitesimalPart
      );
   }
   Dual operator*(Dual other) {
      return Dual(
         realPart * other.realPart,
         other.realPart * infinitesimalPart + realPart * other.infinitesimalPart
      );
   }
};
// Example: Finding the partials of z = x * (x + y) + y * y at (x, y) = (2, 3)
Dual f(Dual x, Dual y) { return x * (x + y) + y * y; }
int main () {
   Dual x = Dual(2);
   Dual y = Dual(3);
   Dual epsilon = Dual(0, 1);
   Dual a = f(x + epsilon, y);
   Dual b = f(x, y + epsilon);
   std::cout << "∂z/∂x = " << a.infinitesimalPart << ", "
             << "∂z/∂y = " << b.infinitesimalPart << std::endl;
   // Output: ∂z/∂x = 7, ∂z/∂y = 8
   return 0;
}
</syntaxhighlight>

===Vector arguments and functions===

Multivariate functions can be handled with the same efficiency and mechanisms as univariate functions by adopting a directional derivative operator. That is, if it is sufficient to compute <math>y' = \nabla f(x)\cdot x'</math>, the directional derivative <math>y' \in \R^m</math> of <math>f:\R^n\to\R^m</math> at <math>x \in \R^n</math> in the direction <math>x' \in \R^n</math> may be calculated as <math>(\langle y_1,y'_1\rangle, \ldots, \langle y_m,y'_m\rangle) = f(\langle x_1,x'_1\rangle, \ldots, \langle x_n,x'_n\rangle)</math> using the same arithmetic as above. If all the elements of <math>\nabla f</math> are desired, then <math>n</math> function evaluations are required. Note that in many optimization applications, the directional derivative is indeed sufficient.

===High order and many variables===

The above arithmetic can be generalized to calculate second order and higher derivatives of multivariate functions. However, the arithmetic rules quickly grow complicated: complexity is quadratic in the highest derivative degree. Instead, truncated [[Taylor series|Taylor polynomial]] algebra can be used. The resulting arithmetic, defined on generalized dual numbers, allows efficient computation using functions as if they were a data type. Once the Taylor polynomial of a function is known, the derivatives are easily extracted.