Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Backpropagation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Optimization algorithm for artificial neural networks}} {{About|the computer algorithm|the biological process|neural backpropagation}} {{Hatnote|Backpropagation can also refer to the way the result of a playout is propagated up the search tree in [[Monte Carlo tree search#Principle of operation|Monte Carlo tree search]].}}{{Machine learning bar}} In [[machine learning]], '''backpropagation''' is a [[gradient]] computation method commonly used for training a [[Neural network (machine learning)|neural network]] to compute its parameter updates. It is an efficient application of the [[chain rule]] to neural networks. Backpropagation computes the [[gradient]] of a [[loss function]] with respect to the [[Glossary of graph theory terms#weight|weights]] of the network for a single inputβoutput example, and does so [[Algorithmic efficiency|efficiently]], computing the gradient one layer at a time, [[iteration|iterating]] backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through [[dynamic programming]].<ref name="kelley1960">{{cite journal |last1=Kelley |first1=Henry J. |author-link=Henry J. Kelley |year=1960 |title=Gradient theory of optimal flight paths |journal=ARS Journal |volume=30 |issue=10 |pages=947β954 |doi=10.2514/8.5282}}</ref><ref name="bryson1961">{{cite book |last=Bryson |first=Arthur E. |title=Proceedings of the Harvard Univ. Symposium on digital computers and their applications, 3β6 April 1961 |publisher=Harvard University Press |year=1962 |location=Cambridge |chapter=A gradient method for optimizing multi-stage allocation processes |oclc=498866871}}</ref>{{sfn|Goodfellow|Bengio|Courville|2016|p=[https://www.deeplearningbook.org/contents/mlp.html#pf33 214]|ps=, "This table-filling strategy is sometimes called ''dynamic programming''."}} Strictly speaking, the term ''backpropagation'' refers only to an algorithm for efficiently computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm β including how the gradient is used, such as by [[stochastic gradient descent]], or as an intermediate step in a more complicated optimizer, such as [[Stochastic gradient descent#Adam|Adaptive Moment Estimation]].<ref>{{harvnb|Goodfellow|Bengio|Courville|2016|p=[https://www.deeplearningbook.org/contents/mlp.html#pf25 200]}}, "The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient."</ref> The local minimum convergence, exploding gradient, vanishing gradient, and weak control of learning rate are main disadvantages of these optimization algorithms. The [[Hessian matrix|Hessian]] and quasi-Hessian optimizers solve only local minimum convergence problem, and the backpropagation works longer. These problems caused researchers to develop hybrid<ref>{{Cite journal |last1=Mohapatra |first1=Rohan |last2=Saha |first2=Snehanshu |last3=Coello |first3=Carlos A. Coello |last4=Bhattacharya |first4=Anwesh |last5=Dhavala |first5=Soma S. |last6=Saha |first6=Sriparna |date=April 2022 |title=AdaSwarm: Augmenting Gradient-Based Optimizers in Deep Learning With Swarm Intelligence |url=https://ieeexplore.ieee.org/document/9472873 |journal=IEEE Transactions on Emerging Topics in Computational Intelligence |volume=6 |issue=2 |pages=329β340 |doi=10.1109/TETCI.2021.3083428 |issn=2471-285X|arxiv=2006.09875 |hdl=20.500.11824/1557 }}</ref> and fractional<ref>{{Cite journal |last1=Abdulkadirov |first1=Ruslan I. |last2=Lyakhov |first2=Pavel A. |last3=Baboshina |first3=Valentina A. |last4=Nagornov |first4=Nikolay N. |date=2024 |title=Improving the Accuracy of Neural Network Pattern Recognition by Fractional Gradient Descent |journal=IEEE Access |volume=12 |pages=168428β168444 |doi=10.1109/ACCESS.2024.3491614 |issn=2169-3536|doi-access=free |bibcode=2024IEEEA..12p8428A }}</ref> optimization algorithms. Backpropagation had multiple discoveries and partial discoveries, with a tangled history and terminology. See the [[#History|history]] section for details. Some other names for the technique include "reverse mode of [[automatic differentiation]]" or "[[reverse accumulation]]".<ref name="DL-reverse-mode">{{harvtxt|Goodfellow|Bengio|Courville|2016|p=[https://www.deeplearningbook.org/contents/mlp.html#pf36 217]β218}}, "The back-propagation algorithm described here is only one approach to automatic differentiation. It is a special case of a broader class of techniques called ''reverse mode accumulation''."</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)