Editing Automatic differentiation (section)

== Implementation ==

Forward-mode AD is implemented by a [[nonstandard interpretation]] of the program in which real numbers are replaced by dual numbers, constants are lifted to dual numbers with a zero epsilon coefficient, and the numeric primitives are lifted to operate on dual numbers. This nonstandard interpretation is generally implemented using one of two strategies: ''source code transformation'' or ''operator overloading''.

=== Source code transformation (SCT) ===
[[Image:SourceTransformationAutomaticDifferentiation.png|thumb|right|300px|Figure 4: Example of how source code transformation could work]]

The source code for a function is replaced by an automatically generated source code that includes statements for calculating the derivatives interleaved with the original instructions.

[[Source code transformation]] can be implemented for all programming languages, and it is also easier for the compiler to do compile time optimizations. However, the implementation of the AD tool itself is more difficult and the build system is more complex.

=== Operator overloading (OO) ===

[[Image:OperatorOverloadingAutomaticDifferentiation.png|thumb|right|300px|Figure 5: Example of how operator overloading could work]]
[[Operator overloading]] is a possibility for source code written in a language supporting it. Objects for real numbers and elementary mathematical operations must be overloaded to cater for the augmented arithmetic depicted above. This requires no change in the form or sequence of operations in the original source code for the function to be differentiated, but often requires changes in basic data types for numbers and vectors to support overloading and often also involves the insertion of special flagging operations. Due to the inherent operator overloading overhead on each loop, this approach usually demonstrates weaker speed performance.

=== Operator overloading and source code transformation ===
Overloaded Operators can be used to extract the valuation graph, followed by automatic generation of the AD-version of the primal function at run-time. Unlike the classic OO AAD, such AD-function does not change from one iteration to the next one. Hence there is any OO or tape interpretation run-time overhead per Xi sample.

With the AD-function being generated at runtime, it can be optimised to take into account the current state of the program and precompute certain values. In addition, it can be generated in a way to consistently utilize native CPU vectorization to process 4(8)-double chunks of user data (AVX2\AVX512 speed up x4-x8). With multithreading added into account, such approach can lead to a final acceleration of order 8 × #Cores compared to the traditional AAD tools. A reference implementation is available on GitHub.<ref>{{Cite web|url=https://github.com/matlogica/aadc-prototype|title=AADC Prototype Library|date=June 22, 2022|via=GitHub}}</ref>