Editing Connectionism (section)

==The first wave==
The first wave begun in 1943 with [[Warren Sturgis McCulloch]] and [[Walter Pitts]] both focusing on comprehending neural circuitry through a formal and mathematical approach.  McCulloch and Pitts showed how neural systems could implement [[first-order logic]]: Their classic paper "A Logical Calculus of Ideas Immanent in Nervous Activity" (1943) is important in this development here. They were influenced by the work of [[Nicolas Rashevsky]] in the 1930s and symbolic logic in the style of ''[[Principia Mathematica]]''.<ref>{{Cite journal |last1=McCulloch |first1=Warren S. |last2=Pitts |first2=Walter |date=1943-12-01 |title=A logical calculus of the ideas immanent in nervous activity |url=https://doi.org/10.1007/BF02478259 |journal=The Bulletin of Mathematical Biophysics |language=en |volume=5 |issue=4 |pages=115–133 |doi=10.1007/BF02478259 |issn=1522-9602|url-access=subscription }}</ref><ref name="2019TheCuriousCaseOfConnectionism" />

[[Donald O. Hebb|Hebb]] contributed greatly to speculations about neural functioning, and proposed a learning principle, [[Hebbian learning]]. [[Karl Lashley|Lashley]] argued for distributed representations as a result of his failure to find anything like a localized [[Engram (neuropsychology)|engram]] in years of [[lesion]] experiments. [[Friedrich Hayek]] independently conceived the model, first in a brief unpublished manuscript in 1920,<ref>Hayek, Friedrich A. [1920] 1991. Beiträge zur Theorie der Entwicklung des Bewusstseins [Contributions to a theory of how consciousness develops]. Manuscript, translated by Grete Heinz.</ref><ref>{{Cite journal |last=Caldwell |first=Bruce |date=2004 |title=Some Reflections on F.A. Hayek's The Sensory Order |url=http://link.springer.com/10.1007/s10818-004-5505-9 |journal=Journal of Bioeconomics |language=en |volume=6 |issue=3 |pages=239–254 |doi=10.1007/s10818-004-5505-9 |s2cid=144437624 |issn=1387-6996|url-access=subscription }}</ref> then expanded into a book in 1952.<ref>{{Cite book |last=Hayek |first=F. A. |title=The Sensory Order: An Inquiry into the Foundations of Theoretical Psychology |date=2012-09-15 |publisher=The University of Chicago Press |edition=1st |language=en}}</ref>

The Perceptron machines were proposed and built by [[Frank Rosenblatt]], who published the 1958 paper “The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain” in ''Psychological Review'', while working at the Cornell Aeronautical Laboratory. He cited Hebb, Hayek, Uttley, and [[W. Ross Ashby|Ashby]] as main influences.

Another form of connectionist model was the [[Stratificational linguistics|relational network]] framework developed by the [[linguist]] [[Sydney Lamb]] in the 1960s.

The research group led by Widrow empirically searched for methods to train two-layered [[ADALINE]] networks (MADALINE), with limited success.<ref>pp 124-129, Olazaran Rodriguez, Jose Miguel. ''[https://web.archive.org/web/20221111165150/https://era.ed.ac.uk/bitstream/handle/1842/20075/Olazaran-RodriguezJM_1991redux.pdf?sequence=1&isAllowed=y A historical sociology of neural network research]''. PhD Dissertation. University of Edinburgh, 1991.</ref><ref>Widrow, B. (1962) ''Generalization and information storage in networks of ADALINE "neurons"''. In M. C. Yovits, G. T. Jacobi, & G. D. Goldstein (Ed.), Self-Organizing Svstems-1962 (pp. 435-461). Washington, DC: Spartan Books.</ref>

A method to train multilayered perceptrons with arbitrary levels of trainable weights was published by [[Alexey Grigorevich Ivakhnenko]] and Valentin Lapa in 1965, called the [[Group method of data handling|Group Method of Data Handling]]. This method employs incremental layer by layer training based on [[regression analysis]], where useless units in hidden layers are pruned with the help of a validation set.<ref name="ivak1967">{{cite book |last1=Ivakhnenko |first1=A. G. |url={{google books |plainurl=y |id=rGFgAAAAMAAJ}} |title=Cybernetics and forecasting techniques |last2=Grigorʹevich Lapa |first2=Valentin |publisher=American Elsevier Pub. Co. |year=1967}}</ref><ref name="DLhistory">{{cite arXiv |eprint=2212.11279 |class=cs.NE |first=Jürgen |last=Schmidhuber |author-link=Jürgen Schmidhuber |title=Annotated History of Modern AI and Deep Learning |date=2022}}</ref><ref name="ivak1965">{{cite book |last=Ivakhnenko |first=A. G. |url={{google books |plainurl=y |id=FhwVNQAACAAJ}} |title=Cybernetic Predicting Devices |publisher=CCM Information Corporation |year=1973}}</ref>

The first multilayered perceptrons trained by [[stochastic gradient descent]]<ref name="robbins1951">{{Cite journal |last1=Robbins |first1=H. |author-link=Herbert Robbins |last2=Monro |first2=S. |year=1951 |title=A Stochastic Approximation Method |journal=The Annals of Mathematical Statistics |volume=22 |issue=3 |page=400 |doi=10.1214/aoms/1177729586 |doi-access=free}}</ref> was published in 1967 by [[Shun'ichi Amari]].<ref name="Amari1967">{{cite journal |last1=Amari |first1=Shun'ichi |author-link=Shun'ichi Amari |date=1967 |title=A theory of adaptive pattern classifier |journal=IEEE Transactions |volume=EC |issue=16 |pages=279–307}}</ref> In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned useful [[Knowledge representation|internal representations]] to classify non-linearily separable pattern classes.<ref name="DLhistory" />

In 1972, [[Shun'ichi Amari]] produced an early example of [[self-organizing network]].<ref>{{Cite journal |last=Amari |first=S.-I. |date=November 1972 |title=Learning Patterns and Pattern Sequences by Self-Organizing Nets of Threshold Elements |url=https://ieeexplore.ieee.org/document/1672070 |journal=IEEE Transactions on Computers |volume=C-21 |issue=11 |pages=1197–1206 |doi=10.1109/T-C.1972.223477 |issn=0018-9340|url-access=subscription }}</ref>