Editing Neural network (machine learning) (section)

=== Early work ===

Today's deep neural networks are based on early work in [[statistics]] over 200 years ago. The simplest kind of [[feedforward neural network]] (FNN) is a linear network, which consists of a single layer of output nodes with linear activation functions; the inputs are fed directly to the outputs via a series of weights. The sum of the products of the weights and the inputs is calculated at each node. The [[mean squared error]]s between these calculated outputs and the given target values are minimized by creating an adjustment to the weights. This technique has been known for over two centuries as the [[method of least squares]] or [[linear regression]]. It was used as a means of finding a good rough linear fit to a set of points by [[Adrien-Marie Legendre|Legendre]] (1805) and [[Gauss]] (1795) for the prediction of planetary movement.<ref name="legendre1805">Mansfield Merriman, "A List of Writings Relating to the Method of Least Squares"</ref><ref name="gauss1795">{{cite journal |first=Stephen M. |last=Stigler |year=1981 |title=Gauss and the Invention of Least Squares |journal=Ann. Stat. |volume=9 |issue=3 |pages=465–474 |doi=10.1214/aos/1176345451 |doi-access=free }}</ref><ref name=brertscher>{{cite book |last=Bretscher |first=Otto |title=Linear Algebra With Applications |edition=3rd |publisher=Prentice Hall |year=1995 |location=Upper Saddle River, NJ}}</ref><ref name=DLhistory/><ref name=stigler>
{{cite book |last=Stigler |first=Stephen M. |author-link=Stephen Stigler |year=1986 |title=The History of Statistics: The Measurement of Uncertainty before 1900 |location=Cambridge |publisher=Harvard |isbn=0-674-40340-1 |url-access=registration |url=https://archive.org/details/historyofstatist00stig}}</ref>

Historically, digital computers such as the [[von Neumann model]] operate via the execution of explicit instructions with access to memory by a number of processors. Some neural networks, on the other hand, originated from efforts to model information processing in biological systems through the framework of [[connectionism]]. Unlike the von Neumann model, connectionist computing does not separate memory and processing.

[[Warren McCulloch]] and [[Walter Pitts]]<ref name=WM /> (1943) considered a non-learning computational model for neural networks.<ref>{{Cite news |last=Kleene |first=S.C. |year=1956 |title=Representation of Events in Nerve Nets and Finite Automata |url=https://www.degruyter.com/view/books/9781400882618/9781400882618-002/9781400882618-002.xml |access-date=17 June 2017 |work=Annals of Mathematics Studies |publisher=Princeton University Press |pages=3–41 |issue=34 |archive-date=19 May 2024 |archive-url=https://web.archive.org/web/20240519081121/https://www.degruyter.com/view/books/9781400882618/9781400882618-002/9781400882618-002.xml |url-status=live }}</ref> This model paved the way for research to split into two approaches. One approach focused on biological processes while the other focused on the application of neural networks to artificial intelligence.

In the late 1940s, [[Donald O. Hebb|D. O. Hebb]]<ref>{{cite book|url={{google books |plainurl=y |id=ddB4AgAAQBAJ}}|title=The Organization of Behavior|last=Hebb|first=Donald|publisher=Wiley|year=1949|isbn=978-1-135-63190-1|location=New York}}</ref> proposed a learning [[hypothesis]] based on the mechanism of [[Neuroplasticity|neural plasticity]] that became known as [[Hebbian learning]]. It was used in many early neural networks, such as Rosenblatt's [[perceptron]] and the [[Hopfield network]]. Farley and [[Wesley A. Clark|Clark]]<ref>{{cite journal|last=Farley|first=B.G.|author2=W.A. Clark|year=1954|title=Simulation of Self-Organizing Systems by Digital Computer|journal=IRE Transactions on Information Theory|volume=4|issue=4|pages=76–84|doi=10.1109/TIT.1954.1057468}}</ref> (1954) used computational machines to simulate a Hebbian network. Other neural network computational machines were created by [[Nathaniel Rochester (computer scientist)|Rochester]], Holland, Habit and Duda (1956).<ref>{{cite journal|last=Rochester|first=N.|author2=J.H. Holland|author3=L.H. Habit|author4=W.L. Duda|year=1956|title=Tests on a cell assembly theory of the action of the brain, using a large digital computer|journal=IRE Transactions on Information Theory|volume=2|issue=3|pages=80–93|doi=10.1109/TIT.1956.1056810}}</ref> 

In 1958, psychologist [[Frank Rosenblatt]] described the perceptron, one of the first implemented artificial neural networks,<ref>Haykin (2008) Neural Networks and Learning Machines, 3rd edition</ref><ref>{{cite journal|last=Rosenblatt|first=F.|title=The Perceptron: A Probabilistic Model For Information Storage And Organization in the Brain|journal=Psychological Review|year=1958|volume=65|pages=386–408|doi=10.1037/h0042519|pmid=13602029|issue=6|citeseerx=10.1.1.588.3775|s2cid=12781225 }}</ref><ref name="Werbos 1975">{{cite book|url={{google books |plainurl=y |id=z81XmgEACAAJ}}|title=Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences|last=Werbos|first=P.J.|year=1975}}</ref><ref>{{cite journal |last=Rosenblatt |first=Frank |year=1957 |title=The Perceptron—a perceiving and recognizing automaton |journal=Report 85-460-1 |publisher=Cornell Aeronautical Laboratory }}</ref> funded by the United States [[Office of Naval Research]].<ref name="Olazaran">{{cite journal |first=Mikel |last=Olazaran |title=A Sociological Study of the Official History of the Perceptrons Controversy |journal=Social Studies of Science |volume=26 |issue=3 |year=1996 |jstor=285702|doi=10.1177/030631296026003005 |pages=611–659|s2cid=16786738 }}</ref>
R. D. Joseph (1960)<ref name="joseph1960">{{cite book |last=Joseph |first=R. D. |title=Contributions to Perceptron Theory, Cornell Aeronautical Laboratory Report No. VG-11 96--G-7, Buffalo |year=1960}}</ref> mentions an even earlier perceptron-like device by Farley and Clark:<ref name="DLhistory"/> "Farley and Clark of MIT Lincoln Laboratory actually preceded Rosenblatt in the development of a perceptron-like device." However, "they dropped the subject."
The perceptron raised public excitement for research in Artificial Neural Networks, causing the US government to drastically increase funding. This contributed to "the Golden Age of AI" fueled by the optimistic claims made by computer scientists regarding the ability of perceptrons to emulate human intelligence.<ref name=":08">{{Cite book |author=Russel, Stuart |author2=Norvig, Peter |url=https://people.engr.tamu.edu/guni/csce421/files/AI_Russell_Norvig.pdf |title=Artificial Intelligence A Modern Approach |publisher=Pearson Education |year=2010 |isbn=978-0-13-604259-4 |edition=3rd |location=United States of America |pages=16–28 |language=en}}</ref>

The first perceptrons did not have adaptive hidden units. However, Joseph (1960)<ref name="joseph1960"/> also discussed [[multilayer perceptrons]] with an adaptive hidden layer. Rosenblatt (1962)<ref name="rosenblatt1962">{{cite book |last=Rosenblatt |first=Frank |author-link=Frank Rosenblatt |title=Principles of Neurodynamics |publisher=Spartan, New York |year=1962}}</ref>{{rp|section 16}} cited and adopted these ideas, also crediting work by H. D. Block and B. W. Knight. Unfortunately, these early efforts did not lead to a working learning algorithm for hidden units, i.e., [[deep learning]].