Editing Backpropagation (section)

=== Modern backpropagation ===
Modern backpropagation was first published by [[Seppo Linnainmaa]] as "reverse mode of [[automatic differentiation]]" (1970)<ref name="lin1970">{{cite thesis |first=Seppo |last=Linnainmaa |author-link=Seppo Linnainmaa |year=1970 |type=Masters |title=The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors |language=fi |publisher=University of Helsinki |pages=6–7}}</ref> for discrete connected networks of nested [[Differentiable function|differentiable]] functions.<ref name="lin1976">{{cite journal |last1=Linnainmaa |first1=Seppo |author-link=Seppo Linnainmaa |year=1976 |title=Taylor expansion of the accumulated rounding error |journal=BIT Numerical Mathematics |volume=16 |issue=2 |pages=146–160 |doi=10.1007/bf01931367 |s2cid=122357351}}</ref><ref name="grie2012">{{cite book |last=Griewank |first=Andreas |title=Optimization Stories |year=2012 |series=Documenta Mathematica, Extra Volume ISMP |pages=389–400 |chapter=Who Invented the Reverse Mode of Differentiation? |s2cid=15568746}}</ref><ref name="grie2008">{{cite book |last1=Griewank |first1=Andreas |url={{google books |plainurl=y |id=xoiiLaRxcbEC}} |title=Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition |last2=Walther |first2=Andrea |author2-link=Andrea Walther |publisher=SIAM |year=2008 |isbn=978-0-89871-776-1}}</ref>

In 1982, [[Paul Werbos]] applied backpropagation to MLPs in the way that has become standard.<ref name="werbos1982">{{Cite book|title=System modeling and optimization|last=Werbos|first=Paul|publisher=Springer|year=1982|pages=762–770|chapter=Applications of advances in nonlinear sensitivity analysis|author-link=Paul Werbos|chapter-url=http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|access-date=2 July 2017|archive-date=14 April 2016|archive-url=https://web.archive.org/web/20160414055503/http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf|url-status=live}}</ref><ref name="werbos1974">{{cite book |last=Werbos |first=Paul J. |title=The Roots of Backpropagation : From Ordered Derivatives to Neural Networks and Political Forecasting |location=New York |publisher=John Wiley & Sons |year=1994 |isbn=0-471-59897-6 }}</ref> Werbos described how he developed backpropagation in an interview. In 1971, during his PhD work, he developed backpropagation to mathematicize [[Sigmund Freud|Freud]]'s "flow of psychic energy". He faced repeated difficulty in publishing the work, only managing in 1981.<ref name=":1">{{Cite book |url=https://direct.mit.edu/books/book/4886/Talking-NetsAn-Oral-History-of-Neural-Networks |title=Talking Nets: An Oral History of Neural Networks |date=2000 |publisher=The MIT Press |isbn=978-0-262-26715-1 |editor-last=Anderson |editor-first=James A. |language=en |doi=10.7551/mitpress/6626.003.0016 |editor-last2=Rosenfeld |editor-first2=Edward}}</ref> He also claimed that "the first practical application of back-propagation was for estimating a dynamic model to predict nationalism and social communications in 1974" by him.<ref>P. J. Werbos, "Backpropagation through time: what it does and how to do it," in Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, Oct. 1990, {{doi|10.1109/5.58337}}</ref>

Around 1982,<ref name=":1" />{{rp|376}} [[David E. Rumelhart]] independently developed<ref>Olazaran Rodriguez, Jose Miguel. ''[https://web.archive.org/web/20221111165150/https://era.ed.ac.uk/bitstream/handle/1842/20075/Olazaran-RodriguezJM_1991redux.pdf?sequence=1&isAllowed=y A historical sociology of neural network research]''. PhD Dissertation. University of Edinburgh, 1991.</ref>{{rp|252}} backpropagation and taught the algorithm to others in his research circle. He did not cite previous work as he was unaware of them. He published the algorithm first in a 1985 paper, then in a 1986 ''[[Nature (journal)|Nature]]'' paper an experimental analysis of the technique.<ref name="learning-representations">{{cite journal | last1 = Rumelhart | last2 = Hinton | last3 = Williams | title=Learning representations by back-propagating errors | journal = Nature | volume = 323 | issue = 6088 | pages = 533–536 | url = http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf| doi = 10.1038/323533a0 | year = 1986 | bibcode = 1986Natur.323..533R | s2cid = 205001834 }}</ref> These papers became highly cited, contributed to the popularization of backpropagation, and coincided with the resurging research interest in neural networks during the 1980s.<ref name="RumelhartHintonWilliams1986a" /><ref name="RumelhartHintonWilliams1986b">{{cite book |editor1-last=Rumelhart |editor1-first=David E. |editor1-link=David E. Rumelhart |editor2-first=James L. |editor2-last=McClelland |editor2-link=James McClelland (psychologist) |title=Parallel Distributed Processing : Explorations in the Microstructure of Cognition |volume=1 : Foundations |last1=Rumelhart |first1=David E. |author-link1=David E. Rumelhart |last2=Hinton |first2=Geoffrey E. |author-link2=Geoffrey E. Hinton |first3=Ronald J. |last3=Williams |author-link3=Ronald J. Williams |chapter=8. Learning Internal Representations by Error Propagation |location=Cambridge |publisher=MIT Press |year=1986b |isbn=0-262-18120-7 |chapter-url-access=registration |chapter-url=https://archive.org/details/paralleldistribu00rume }}</ref><ref>{{cite book|url={{google books |plainurl=y |id=4j9GAQAAIAAJ}}|title=Introduction to Machine Learning|last=Alpaydin|first=Ethem|publisher=MIT Press|year=2010|isbn=978-0-262-01243-0}}</ref>

In 1985, the method was also described by David Parker.<ref>{{Cite report |last=Parker |first=D.B. |date=1985 |title=Learning Logic: Casting the Cortex of the Human Brain in Silicon |department=Center for Computational Research in Economics and Management Science |location=Cambridge MA |id=Technical Report TR-47 |publisher=Massachusetts Institute of Technology}}</ref><ref name=":0">{{Cite book |last=Hertz |first=John |title=Introduction to the theory of neural computation |date=1991 |publisher=Addison-Wesley |others=Krogh, Anders., Palmer, Richard G. |isbn=0-201-50395-6 |location=Redwood City, Calif. |pages=8 |oclc=21522159}}</ref> [[Yann LeCun]] proposed an alternative form of backpropagation for neural networks in his PhD thesis in 1987.<ref>{{Cite thesis |title=Modèles connexionnistes de l'apprentissage |url=https://www.sudoc.fr/043586643 |publisher=Université Pierre et Marie Curie |date=1987 |place=Paris, France |degree=Thèse de doctorat d'état |first=Yann |last=Le Cun}}</ref>

Gradient descent took a considerable amount of time to reach acceptance. Some early objections were: there were no guarantees that gradient descent could reach a global minimum, only local minimum; neurons were "known" by physiologists as making discrete signals (0/1), not continuous ones, and with discrete signals, there is no gradient to take. See the interview with [[Geoffrey Hinton]],<ref name=":1" /> who was awarded the 2024 [[Nobel Prize in Physics]] for his contributions to the field.<ref>{{Cite web |title=The Nobel Prize in Physics 2024 |url=https://www.nobelprize.org/prizes/physics/2024/press-release/ |access-date=2024-10-13 |website=NobelPrize.org |language=en-US}}</ref>