Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Backpropagation
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Modern backpropagation === Modern backpropagation was first published by [[Seppo Linnainmaa]] as "reverse mode of [[automatic differentiation]]" (1970)<ref name="lin1970">{{cite thesis |first=Seppo |last=Linnainmaa |author-link=Seppo Linnainmaa |year=1970 |type=Masters |title=The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors |language=fi |publisher=University of Helsinki |pages=6–7}}</ref> for discrete connected networks of nested [[Differentiable function|differentiable]] functions.<ref name="lin1976">{{cite journal |last1=Linnainmaa |first1=Seppo |author-link=Seppo Linnainmaa |year=1976 |title=Taylor expansion of the accumulated rounding error |journal=BIT Numerical Mathematics |volume=16 |issue=2 |pages=146–160 |doi=10.1007/bf01931367 |s2cid=122357351}}</ref><ref name="grie2012">{{cite book |last=Griewank |first=Andreas |title=Optimization Stories |year=2012 |series=Documenta Mathematica, Extra Volume ISMP |pages=389–400 |chapter=Who Invented the Reverse Mode of Differentiation? |s2cid=15568746}}</ref><ref name="grie2008">{{cite book |last1=Griewank |first1=Andreas |url={{google books |plainurl=y |id=xoiiLaRxcbEC}} |title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition |last2=Walther |first2=Andrea |author2-link=Andrea Walther |publisher=SIAM |year=2008 |isbn=978-0-89871-776-1}}</ref> In 1982, [[Paul Werbos]] applied backpropagation to MLPs in the way that has become standard.<ref name="werbos1982">{{Cite book|title=System modeling and optimization|last=Werbos|first=Paul|publisher=Springer|year=1982|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis|author-link=Paul Werbos|chapter-url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|access-date=2 July 2017|archive-date=14 April 2016|archive-url=https://web.archive.org/web/20160414055503/http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|url-status=live}}</ref><ref name="werbos1974">{{cite book |last=Werbos |first=Paul J. |title=The Roots of Backpropagation : From Ordered Derivatives to Neural Networks and Political Forecasting |location=New York |publisher=John Wiley & Sons |year=1994 |isbn=0-471-59897-6 }}</ref> Werbos described how he developed backpropagation in an interview. In 1971, during his PhD work, he developed backpropagation to mathematicize [[Sigmund Freud|Freud]]'s "flow of psychic energy". He faced repeated difficulty in publishing the work, only managing in 1981.<ref name=":1">{{Cite book |url=https://direct.mit.edu/books/book/4886/Talking-NetsAn-Oral-History-of-Neural-Networks |title=Talking Nets: An Oral History of Neural Networks |date=2000 |publisher=The MIT Press |isbn=978-0-262-26715-1 |editor-last=Anderson |editor-first=James A. |language=en |doi=10.7551/mitpress/6626.003.0016 |editor-last2=Rosenfeld |editor-first2=Edward}}</ref> He also claimed that "the first practical application of back-propagation was for estimating a dynamic model to predict nationalism and social communications in 1974" by him.<ref>P. J. Werbos, "Backpropagation through time: what it does and how to do it," in Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, Oct. 1990, {{doi|10.1109/5.58337}}</ref> Around 1982,<ref name=":1" />{{rp|376}} [[David E. Rumelhart]] independently developed<ref>Olazaran Rodriguez, Jose Miguel. ''[https://web.archive.org/web/20221111165150/https://era.ed.ac.uk/bitstream/handle/1842/20075/Olazaran-RodriguezJM_1991redux.pdf?sequence=1&isAllowed=y A historical sociology of neural network research]''. PhD Dissertation. University of Edinburgh, 1991.</ref>{{rp|252}} backpropagation and taught the algorithm to others in his research circle. He did not cite previous work as he was unaware of them. He published the algorithm first in a 1985 paper, then in a 1986 ''[[Nature (journal)|Nature]]'' paper an experimental analysis of the technique.<ref name="learning-representations">{{cite journal | last1 = Rumelhart | last2 = Hinton | last3 = Williams | title=Learning representations by back-propagating errors | journal = Nature | volume = 323 | issue = 6088 | pages = 533–536 | url = http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf| doi = 10.1038/323533a0 | year = 1986 | bibcode = 1986Natur.323..533R | s2cid = 205001834 }}</ref> These papers became highly cited, contributed to the popularization of backpropagation, and coincided with the resurging research interest in neural networks during the 1980s.<ref name="RumelhartHintonWilliams1986a" /><ref name="RumelhartHintonWilliams1986b">{{cite book |editor1-last=Rumelhart |editor1-first=David E. |editor1-link=David E. Rumelhart |editor2-first=James L. |editor2-last=McClelland |editor2-link=James McClelland (psychologist) |title=Parallel Distributed Processing : Explorations in the Microstructure of Cognition |volume=1 : Foundations |last1=Rumelhart |first1=David E. |author-link1=David E. Rumelhart |last2=Hinton |first2=Geoffrey E. |author-link2=Geoffrey E. Hinton |first3=Ronald J. |last3=Williams |author-link3=Ronald J. Williams |chapter=8. Learning Internal Representations by Error Propagation |location=Cambridge |publisher=MIT Press |year=1986b |isbn=0-262-18120-7 |chapter-url-access=registration |chapter-url=https://archive.org/details/paralleldistribu00rume }}</ref><ref>{{cite book|url={{google books |plainurl=y |id=4j9GAQAAIAAJ}}|title=Introduction to Machine Learning|last=Alpaydin|first=Ethem|publisher=MIT Press|year=2010|isbn=978-0-262-01243-0}}</ref> In 1985, the method was also described by David Parker.<ref>{{Cite report |last=Parker |first=D.B. |date=1985 |title=Learning Logic: Casting the Cortex of the Human Brain in Silicon |department=Center for Computational Research in Economics and Management Science |location=Cambridge MA |id=Technical Report TR-47 |publisher=Massachusetts Institute of Technology}}</ref><ref name=":0">{{Cite book |last=Hertz |first=John |title=Introduction to the theory of neural computation |date=1991 |publisher=Addison-Wesley |others=Krogh, Anders., Palmer, Richard G. |isbn=0-201-50395-6 |location=Redwood City, Calif. |pages=8 |oclc=21522159}}</ref> [[Yann LeCun]] proposed an alternative form of backpropagation for neural networks in his PhD thesis in 1987.<ref>{{Cite thesis |title=Modèles connexionnistes de l'apprentissage |url=https://www.sudoc.fr/043586643 |publisher=Université Pierre et Marie Curie |date=1987 |place=Paris, France |degree=Thèse de doctorat d'état |first=Yann |last=Le Cun}}</ref> Gradient descent took a considerable amount of time to reach acceptance. Some early objections were: there were no guarantees that gradient descent could reach a global minimum, only local minimum; neurons were "known" by physiologists as making discrete signals (0/1), not continuous ones, and with discrete signals, there is no gradient to take. See the interview with [[Geoffrey Hinton]],<ref name=":1" /> who was awarded the 2024 [[Nobel Prize in Physics]] for his contributions to the field.<ref>{{Cite web |title=The Nobel Prize in Physics 2024 |url=https://www.nobelprize.org/prizes/physics/2024/press-release/ |access-date=2024-10-13 |website=NobelPrize.org |language=en-US}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)