Editing Automatic differentiation (section)

=== Forward accumulation ===
[[File:ForwardAD.png|thumb|Forward accumulation]]
In forward accumulation AD, one first fixes the ''independent variable'' with respect to which differentiation is performed and computes the derivative of each sub-[[expression (mathematics)|expression]] recursively. In a pen-and-paper calculation, this involves repeatedly substituting the derivative of the ''inner'' functions in the chain rule:
<math display="block">\begin{align}
\frac{\partial y}{\partial x}
&= \frac{\partial y}{\partial w_{n-1}} \frac{\partial w_{n-1}}{\partial x} \\[6pt]
&= \frac{\partial y}{\partial w_{n-1}} \left(\frac{\partial w_{n-1}}{\partial w_{n-2}} \frac{\partial w_{n-2}}{\partial x}\right) \\[6pt]
&= \frac{\partial y}{\partial w_{n-1}} \left(\frac{\partial w_{n-1}}{\partial w_{n-2}} \left(\frac{\partial w_{n-2}}{\partial w_{n-3}} \frac{\partial w_{n-3}}{\partial x}\right)\right) \\[6pt]
&= \cdots
\end{align}</math>
This can be generalized to multiple variables as a matrix product of [[Jacobian matrix and determinant|Jacobian]]s.

Compared to reverse accumulation, forward accumulation is natural and easy to implement as the flow of derivative information coincides with the order of evaluation. Each variable <math>w_i</math> is augmented with its derivative <math>\dot w_i</math> (stored as a numerical value, not a symbolic expression),
<math display="block">\dot w_i = \frac{\partial w_i}{\partial x}</math>
as denoted by the dot. The derivatives are then computed in sync with the evaluation steps and combined with other derivatives via the chain rule.

Using the chain rule, if <math>w_i</math> has predecessors in the computational graph:
:<math>\dot w_i = \sum_{j \in \{\text{predecessors of i}\}} \frac{\partial w_i}{\partial w_j} \dot w_j</math>

[[Image:ForwardAccumulationAutomaticDifferentiation.png|right|thumb|300px|Figure 2: Example of forward accumulation with computational graph]]

As an example, consider the function:
<math display="block">\begin{align}
y
&= f(x_1, x_2) \\
&= x_1 x_2 + \sin x_1 \\
&= w_1 w_2 + \sin w_1 \\
&= w_3 + w_4 \\
&= w_5
\end{align}</math>
For clarity, the individual sub-expressions have been labeled with the variables <math>w_i</math>.

The choice of the independent variable to which differentiation is performed affects the ''seed'' values {{math|''ẇ''<sub>1</sub>}} and {{math|''ẇ''<sub>2</sub>}}. Given interest in the derivative of this function with respect to {{math|''x''<sub>1</sub>}}, the seed values should be set to:
<math display="block">\begin{align}
\dot w_1 = \frac{\partial w_1}{\partial x_1} = \frac{\partial x_1}{\partial x_1} = 1 \\
\dot w_2 = \frac{\partial w_2}{\partial x_1} = \frac{\partial x_2}{\partial x_1} = 0
\end{align}</math>

With the seed values set, the values propagate using the chain rule as shown. Figure 2 shows a pictorial depiction of this process as a computational graph.
:{| class="wikitable"
!Operations to compute value !!Operations to compute derivative
|-
|<math>w_1 = x_1</math> || <math>\dot w_1 = 1</math> (seed)
|-
|<math>w_2 = x_2</math> || <math>\dot w_2 = 0</math> (seed)
|-
|<math>w_3 = w_1 \cdot w_2</math> || <math>\dot w_3 = w_2 \cdot \dot w_1 + w_1 \cdot \dot w_2</math>
|-
|<math>w_4 = \sin w_1</math> || <math>\dot w_4 = \cos w_1 \cdot \dot w_1</math>
|-
|<math>w_5 = w_3 + w_4</math> || <math>\dot w_5 = \dot w_3 + \dot w_4</math>
|}

To compute the [[gradient]] of this example function, which requires not only <math>\tfrac{\partial y}{\partial x_1}</math> but also <math>\tfrac{\partial y}{\partial x_2}</math>, an ''additional'' sweep is performed over the computational graph using the seed values <math>\dot w_1 = 0; \dot w_2 = 1</math>.

==== Implementation ====
===== Pseudocode =====
Forward accumulation calculates the function and the derivative (but only for one independent variable each) in one pass. The associated method call expects the expression ''Z'' to be derived with regard to a variable ''V''. The method returns a pair of the evaluated function and its derivative. The method traverses the expression tree recursively until a variable is reached. If the derivative with respect to this variable is requested, its derivative is 1, 0 otherwise. Then the partial function as well as the partial derivative are evaluated.<ref name=demm22>{{cite book|author = Maximilian E. Schüle, Maximilian Springer, [[Alfons Kemper]], [[Thomas Neumann]] |title=Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning |chapter=LLVM code optimisation for automatic differentiation |date=2022|pages=1–4 |doi = 10.1145/3533028.3533302|isbn=9781450393751 |s2cid=248853034 |language=English}}</ref>
<syntaxhighlight lang="cpp">
tuple<float,float> evaluateAndDerive(Expression Z, Variable V) {
   if isVariable(Z)
      if (Z = V) return {valueOf(Z), 1};
      else return {valueOf(Z), 0};
   else if (Z = A + B)
      {a, a'} = evaluateAndDerive(A, V);
      {b, b'} = evaluateAndDerive(B, V);
      return {a + b, a' + b'};
   else if (Z = A - B)
      {a, a'} = evaluateAndDerive(A, V);
      {b, b'} = evaluateAndDerive(B, V);
      return {a - b, a' - b'};
   else if (Z = A * B)
      {a, a'} = evaluateAndDerive(A, V);
      {b, b'} = evaluateAndDerive(B, V);
      return {a * b, b * a' + a * b'};
}
</syntaxhighlight>

===== C++ =====
<syntaxhighlight lang="cpp">
#include <iostream>
struct ValueAndPartial { float value, partial; };
struct Variable;
struct Expression {
   virtual ValueAndPartial evaluateAndDerive(Variable *variable) = 0;
};
struct Variable: public Expression {
   float value;
   Variable(float value): value(value) {}
   ValueAndPartial evaluateAndDerive(Variable *variable) {
      float partial = (this == variable) ? 1.0f : 0.0f;
      return {value, partial};
   }
};
struct Plus: public Expression {
   Expression *a, *b;
   Plus(Expression *a, Expression *b): a(a), b(b) {}
   ValueAndPartial evaluateAndDerive(Variable *variable) {
      auto [valueA, partialA] = a->evaluateAndDerive(variable);
      auto [valueB, partialB] = b->evaluateAndDerive(variable);
      return {valueA + valueB, partialA + partialB};
   }
};
struct Multiply: public Expression {
   Expression *a, *b;
   Multiply(Expression *a, Expression *b): a(a), b(b) {}
   ValueAndPartial evaluateAndDerive(Variable *variable) {
      auto [valueA, partialA] = a->evaluateAndDerive(variable);
      auto [valueB, partialB] = b->evaluateAndDerive(variable);
      return {valueA * valueB, valueB * partialA + valueA * partialB};
   }
};
int main () {
   // Example: Finding the partials of z = x * (x + y) + y * y at (x, y) = (2, 3)
   Variable x(2), y(3);
   Plus p1(&x, &y); Multiply m1(&x, &p1); Multiply m2(&y, &y); Plus z(&m1, &m2);
   float xPartial = z.evaluateAndDerive(&x).partial;
   float yPartial = z.evaluateAndDerive(&y).partial;
   std::cout << "∂z/∂x = " << xPartial << ", "
             << "∂z/∂y = " << yPartial << std::endl;
   // Output: ∂z/∂x = 7, ∂z/∂y = 8
   return 0;
}
</syntaxhighlight>