Editing Automatic differentiation (section)

=== Two types of automatic differentiation ===
Usually, two distinct modes of automatic differentiation are presented.
* '''forward accumulation''' (also called '''bottom-up''', '''forward mode''', or '''tangent mode''')
* '''reverse accumulation''' (also called '''top-down''', '''reverse mode''', or '''adjoint mode''')
Forward accumulation specifies that one traverses the chain rule from inside to outside (that is, first compute <math>\partial w_1/ \partial x</math> and then <math>\partial w_2/\partial w_1</math> and lastly <math>\partial y/\partial w_2</math>), while reverse accumulation traverses from outside to inside (first compute <math>\partial y/\partial w_2</math> and then <math>\partial w_2/\partial w_1</math> and lastly <math>\partial w_1/\partial x</math>). More succinctly,
* Forward accumulation computes the recursive relation: <math>\frac{\partial w_i}{\partial x} = \frac{\partial w_i}{\partial w_{i-1}} \frac{\partial w_{i-1}}{\partial x}</math> with <math>w_3 = y</math>, and,
* Reverse accumulation computes the recursive relation: <math>\frac{\partial y}{\partial w_i} = \frac{\partial y}{\partial w_{i+1}} \frac{\partial w_{i+1}}{\partial w_{i}}</math> with <math>w_0 = x</math>.

The value of the partial derivative, called the ''seed'', is propagated forward or backward and is initially <math>\frac{\partial x}{\partial x}=1</math> or <math>\frac{\partial y}{\partial y}=1</math>. Forward accumulation evaluates the function and calculates the derivative with respect to one independent variable in one pass. For each independent variable <math>x_1,x_2,\dots,x_n</math> a separate pass is therefore necessary in which the derivative with respect to that independent variable is set to one (<math>\frac{\partial x_1}{\partial x_1}=1</math>) and of all others to zero (<math>\frac{\partial x_2}{\partial x_1}= \dots = \frac{\partial x_n}{\partial x_1} = 0</math>). In contrast, reverse accumulation requires the evaluated partial functions for the partial derivatives. Reverse accumulation therefore evaluates the function first and calculates the derivatives with respect to all independent variables in an additional pass.

Which of these two types should be used depends on the sweep count. The [[Computational complexity theory|computational complexity]] of one sweep is proportional to the complexity of the original code.
* Forward accumulation is more efficient than reverse accumulation for functions {{math|''f'' : '''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}} with {{math|''n'' ≪ ''m''}} as only {{math|''n''}} sweeps are necessary, compared to {{math|''m''}} sweeps for reverse accumulation.
* Reverse accumulation is more efficient than forward accumulation for functions {{math|''f'' : '''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}} with {{math|''n'' ≫ ''m''}} as only {{math|''m''}} sweeps are necessary, compared to {{math|''n''}} sweeps for forward accumulation.
[[Backpropagation]] of errors in multilayer perceptrons, a technique used in [[machine learning]], is a special case of reverse accumulation.<ref name="baydin2018automatic" />

Forward accumulation was introduced by R.E. Wengert in 1964.<ref name="Wengert1964"/> According to Andreas Griewank, reverse accumulation has been suggested since the late 1960s, but the inventor is unknown.<ref name="grie2012">{{cite book |last=Griewank |first=Andreas |title=Optimization Stories |chapter=Who invented the reverse mode of differentiation? |year=2012  |series=Documenta Mathematica Series |volume=  6|pages=389–400 |doi=10.4171/dms/6/38 |doi-access=free |isbn=978-3-936609-58-5 |chapter-url=https://ftp.gwdg.de/pub/misc/EMIS/journals/DMJDMV/vol-ismp/52_griewank-andreas-b.pdf }}</ref> [[Seppo Linnainmaa]] published reverse accumulation in 1976.<ref name="lin1976">{{cite journal |last=Linnainmaa |first=Seppo |year=1976 |title=Taylor Expansion of the Accumulated Rounding Error |journal=BIT Numerical Mathematics |volume=16 |issue=2 |pages=146–160 |doi=10.1007/BF01931367 |s2cid=122357351 }}</ref>