Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Automatic differentiation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Automatic differentiation using dual numbers == Forward mode automatic differentiation is accomplished by augmenting the [[Algebra over a field|algebra]] of [[real numbers]] and obtaining a new [[arithmetic]]. An additional component is added to every number to represent the derivative of a function at the number, and all arithmetic operators are extended for the augmented algebra. The augmented algebra is the algebra of [[dual numbers]]. Replace every number <math>\,x</math> with the number <math>x + x'\varepsilon</math>, where <math>x'</math> is a real number, but <math>\varepsilon</math> is an [[abstract number]] with the property <math>\varepsilon^2=0</math> (an [[infinitesimal]]; see ''[[Smooth infinitesimal analysis]]''). Using only this, regular arithmetic gives <math display="block">\begin{align} (x + x'\varepsilon) + (y + y'\varepsilon) &= x + y + (x' + y')\varepsilon \\ (x + x'\varepsilon) - (y + y'\varepsilon) &= x - y + (x' - y')\varepsilon \\ (x + x'\varepsilon) \cdot (y + y'\varepsilon) &= xy + xy'\varepsilon + yx'\varepsilon + x'y'\varepsilon^2 = xy + (x y' + yx')\varepsilon \\ (x + x'\varepsilon) / (y + y'\varepsilon) &= (x/y + x'\varepsilon/y) / (1 + y'\varepsilon/y) = (x/y + x'\varepsilon/y) \cdot (1 - y'\varepsilon/y) = x/y + (x'/y - xy'/y^2)\varepsilon \end{align}</math> using <math>(1 + y'\varepsilon/y) \cdot (1 - y'\varepsilon/y) = 1</math>. Now, [[polynomials]] can be calculated in this augmented arithmetic. If <math>P(x) = p_0 + p_1 x + p_2x^2 + \cdots + p_n x^n</math>, then <math display="block">\begin{align} P(x + x'\varepsilon) &= p_0 + p_1(x + x'\varepsilon) + \cdots + p_n (x + x'\varepsilon)^n \\ &= p_0 + p_1 x + \cdots + p_n x^n + p_1x'\varepsilon + 2p_2xx'\varepsilon + \cdots + np_n x^{n-1} x'\varepsilon \\ &= P(x) + P^{(1)}(x)x'\varepsilon \end{align}</math> where <math>P^{(1)}</math> denotes the derivative of <math>P</math> with respect to its first argument, and <math>x'</math>, called a ''seed'', can be chosen arbitrarily. The new arithmetic consists of [[ordered pair]]s, elements written <math>\langle x, x' \rangle</math>, with ordinary arithmetics on the first component, and first order differentiation arithmetic on the second component, as described above. Extending the above results on polynomials to [[analytic functions]] gives a list of the basic arithmetic and some standard functions for the new arithmetic: <math display="block">\begin{align} \left\langle u,u'\right\rangle + \left\langle v,v'\right\rangle &= \left\langle u + v, u' + v' \right\rangle \\ \left\langle u,u'\right\rangle - \left\langle v,v'\right\rangle &= \left\langle u - v, u' - v' \right\rangle \\ \left\langle u,u'\right\rangle * \left\langle v,v'\right\rangle &= \left\langle u v, u'v + uv' \right\rangle \\ \left\langle u,u'\right\rangle / \left\langle v,v'\right\rangle &= \left\langle \frac{u}{v}, \frac{u'v - uv'}{v^2} \right\rangle \quad ( v\ne 0) \\ \sin\left\langle u,u'\right\rangle &= \left\langle \sin(u) , u' \cos(u) \right\rangle \\ \cos\left\langle u,u'\right\rangle &= \left\langle \cos(u) , -u' \sin(u) \right\rangle \\ \exp\left\langle u,u'\right\rangle &= \left\langle \exp u , u' \exp u \right\rangle \\ \log\left\langle u,u'\right\rangle &= \left\langle \log(u) , u'/u \right\rangle \quad (u>0) \\ \left\langle u,u'\right\rangle^k &= \left\langle u^k , u' k u^{k - 1} \right\rangle \quad (u \ne 0) \\ \left| \left\langle u,u'\right\rangle \right| &= \left\langle \left| u \right| , u' \operatorname{sign} u \right\rangle \quad (u \ne 0) \end{align}</math> and in general for the primitive function <math>g</math>, <math display="block">g(\langle u,u' \rangle , \langle v,v' \rangle ) = \langle g(u,v) , g_u(u,v) u' + g_v(u,v) v' \rangle</math> where <math>g_u</math> and <math>g_v</math> are the derivatives of <math>g</math> with respect to its first and second arguments, respectively. When a binary basic arithmetic operation is applied to mixed arguments—the pair <math>\langle u, u' \rangle</math> and the real number <math>c</math>—the real number is first lifted to <math>\langle c, 0 \rangle</math>. The derivative of a function <math>f : \R\to\R</math> at the point <math>x_0</math> is now found by calculating <math>f(\langle x_0, 1 \rangle)</math> using the above arithmetic, which gives <math>\langle f ( x_0 ) , f' ( x_0 ) \rangle </math> as the result. === Implementation === An example implementation based on the dual number approach follows. ==== Pseudo code ==== {{pre|1= Dual plus(Dual A, Dual B) { return { realPartOf(A) + realPartOf(B), infinitesimalPartOf(A) + infinitesimalPartOf(B) }; } Dual minus(Dual A, Dual B) { return { realPartOf(A) - realPartOf(B), infinitesimalPartOf(A) - infinitesimalPartOf(B) }; } Dual multiply(Dual A, Dual B) { return { realPartOf(A) * realPartOf(B), realPartOf(B) * infinitesimalPartOf(A) + realPartOf(A) * infinitesimalPartOf(B) }; } X = {x, 0}; Y = {y, 0}; Epsilon = {0, 1}; xPartial = infinitesimalPartOf(f(X + Epsilon, Y)); yPartial = infinitesimalPartOf(f(X, Y + Epsilon)); }} ==== C++ ==== <syntaxhighlight lang="cpp"> #include <iostream> struct Dual { float realPart, infinitesimalPart; Dual(float realPart, float infinitesimalPart=0): realPart(realPart), infinitesimalPart(infinitesimalPart) {} Dual operator+(Dual other) { return Dual( realPart + other.realPart, infinitesimalPart + other.infinitesimalPart ); } Dual operator*(Dual other) { return Dual( realPart * other.realPart, other.realPart * infinitesimalPart + realPart * other.infinitesimalPart ); } }; // Example: Finding the partials of z = x * (x + y) + y * y at (x, y) = (2, 3) Dual f(Dual x, Dual y) { return x * (x + y) + y * y; } int main () { Dual x = Dual(2); Dual y = Dual(3); Dual epsilon = Dual(0, 1); Dual a = f(x + epsilon, y); Dual b = f(x, y + epsilon); std::cout << "∂z/∂x = " << a.infinitesimalPart << ", " << "∂z/∂y = " << b.infinitesimalPart << std::endl; // Output: ∂z/∂x = 7, ∂z/∂y = 8 return 0; } </syntaxhighlight> ===Vector arguments and functions=== Multivariate functions can be handled with the same efficiency and mechanisms as univariate functions by adopting a directional derivative operator. That is, if it is sufficient to compute <math>y' = \nabla f(x)\cdot x'</math>, the directional derivative <math>y' \in \R^m</math> of <math>f:\R^n\to\R^m</math> at <math>x \in \R^n</math> in the direction <math>x' \in \R^n</math> may be calculated as <math>(\langle y_1,y'_1\rangle, \ldots, \langle y_m,y'_m\rangle) = f(\langle x_1,x'_1\rangle, \ldots, \langle x_n,x'_n\rangle)</math> using the same arithmetic as above. If all the elements of <math>\nabla f</math> are desired, then <math>n</math> function evaluations are required. Note that in many optimization applications, the directional derivative is indeed sufficient. ===High order and many variables=== The above arithmetic can be generalized to calculate second order and higher derivatives of multivariate functions. However, the arithmetic rules quickly grow complicated: complexity is quadratic in the highest derivative degree. Instead, truncated [[Taylor series|Taylor polynomial]] algebra can be used. The resulting arithmetic, defined on generalized dual numbers, allows efficient computation using functions as if they were a data type. Once the Taylor polynomial of a function is known, the derivatives are easily extracted.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)