Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Neural network (machine learning)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Deep learning breakthroughs in the 1960s and 1970s=== Fundamental research was conducted on ANNs in the 1960s and 1970s. The first working deep learning algorithm was the [[Group method of data handling]], a method to train arbitrarily deep neural networks, published by [[Alexey Ivakhnenko]] and Lapa in the [[Soviet Union]] (1965). They regarded it as a form of polynomial regression,<ref name="ivak1965">{{cite book|first1=A. G. |last1=Ivakhnenko |first2=V. G. |last2=Lapa |title=Cybernetics and Forecasting Techniques|url={{google books |plainurl=y |id=rGFgAAAAMAAJ}}|year=1967|publisher=American Elsevier Publishing Co.|isbn=978-0-444-00020-0}}</ref> or a generalization of Rosenblatt's perceptron.<ref>{{Cite journal |last=Ivakhnenko |first=A.G. |date=March 1970 |title=Heuristic self-organization in problems of engineering cybernetics |url=https://linkinghub.elsevier.com/retrieve/pii/0005109870900920 |journal=Automatica |language=en |volume=6 |issue=2 |pages=207–219 |doi=10.1016/0005-1098(70)90092-0 |archive-date=12 August 2024 |access-date=7 August 2024 |archive-url=https://web.archive.org/web/20240812123448/https://linkinghub.elsevier.com/retrieve/pii/0005109870900920 |url-status=live }}</ref> A 1971 paper described a deep network with eight layers trained by this method,<ref name="ivak1971">{{Cite journal|last=Ivakhnenko|first=Alexey|date=1971|title=Polynomial theory of complex systems|url=http://gmdh.net/articles/history/polynomial.pdf|journal=IEEE Transactions on Systems, Man, and Cybernetics|pages=364–378|doi=10.1109/TSMC.1971.4308320|volume=SMC-1|issue=4|access-date=5 November 2019|archive-date=29 August 2017|archive-url=https://web.archive.org/web/20170829230621/http://www.gmdh.net/articles/history/polynomial.pdf|url-status=live}}</ref> which is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separate validation set. Since the activation functions of the nodes are Kolmogorov-Gabor polynomials, these were also the first deep networks with multiplicative units or "gates."<ref name="DLhistory">{{cite arXiv |eprint=2212.11279 |class=cs.NE |first=Jürgen |last=Schmidhuber |author-link=Jürgen Schmidhuber |title=Annotated History of Modern AI and Deep Learning |date=2022}}</ref> The first deep learning [[multilayer perceptron]] trained by [[stochastic gradient descent]]<ref name="robbins1951">{{Cite journal | last1 = Robbins | first1 = H. | author-link = Herbert Robbins| last2 = Monro | first2 = S. | doi = 10.1214/aoms/1177729586 | title = A Stochastic Approximation Method | journal = The Annals of Mathematical Statistics | volume = 22 | issue = 3 | pages = 400 | year = 1951 | doi-access = free }}</ref> was published in 1967 by [[Shun'ichi Amari]].<ref name="Amari1967">{{cite journal |last1=Amari |first1=Shun'ichi |author-link=Shun'ichi Amari|title=A theory of adaptive pattern classifier|journal= IEEE Transactions |date=1967 |volume=EC |issue=16 |pages=279–307}}</ref> In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned [[Knowledge representation|internal representations]] to classify non-linearily separable pattern classes.<ref name="DLhistory"/> Subsequent developments in hardware and hyperparameter tunings have made end-to-end stochastic gradient descent the currently dominant training technique. In 1969, [[Kunihiko Fukushima]] introduced the [[rectifier (neural networks)|ReLU]] (rectified linear unit) activation function.<ref name="DLhistory" /><ref name="Fukushima1969">{{cite journal |last1=Fukushima |first1=K. |date=1969 |title=Visual feature extraction by a multilayered network of analog threshold elements |journal=IEEE Transactions on Systems Science and Cybernetics |volume=5 |issue=4 |pages=322–333 |doi=10.1109/TSSC.1969.300225}}</ref><ref name=sonoda17>{{cite journal | last1 = Sonoda | first1 = Sho | last2=Murata | first2=Noboru | s2cid = 12149203 | year = 2017 | title = Neural network with unbounded activation functions is universal approximator | journal = Applied and Computational Harmonic Analysis | volume = 43 | issue = 2 | pages = 233–268 | doi = 10.1016/j.acha.2015.12.005| arxiv = 1505.03654 }}</ref> The rectifier has become the most popular activation function for deep learning.<ref>{{cite arXiv |eprint=1710.05941 |class=cs.NE |first1=Prajit |last1=Ramachandran |first2=Zoph |last2=Barret |title=Searching for Activation Functions |date=16 October 2017 |last3=Quoc |first3=V. Le}}</ref> Nevertheless, research stagnated in the United States following the work of [[Marvin Minsky|Minsky]] and [[Seymour Papert|Papert]] (1969),<ref name=":132">{{cite book |last1=Minsky |first1=Marvin |url={{google books |plainurl=y |id=Ow1OAQAAIAAJ}} |title=Perceptrons: An Introduction to Computational Geometry |last2=Papert |first2=Seymour |publisher=MIT Press |year=1969 |isbn=978-0-262-63022-1}}</ref> who emphasized that basic perceptrons were incapable of processing the exclusive-or circuit. This insight was irrelevant for the deep networks of Ivakhnenko (1965) and Amari (1967). In 1976 transfer learning was introduced in neural networks learning.<ref>Bozinovski S. and Fulgosi A. (1976). "The influence of pattern similarity and transfer learning on the base perceptron training" (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.</ref><ref>Bozinovski S.(2020) "Reminder of the first paper on transfer learning in neural networks, 1976". Informatica 44: 291–302.</ref> Deep learning architectures for [[convolutional neural network]]s (CNNs) with convolutional layers and downsampling layers and weight replication began with the [[Neocognitron]] introduced by Kunihiko Fukushima in 1979, though not trained by backpropagation.<ref name="FUKU1979">{{cite journal |last1=Fukushima |first1=K. |year=1979 |title=Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron |journal=Trans. IECE (In Japanese)|volume= J62-A |issue=10 |pages=658–665 |doi=10.1007/bf00344251 |pmid=7370364 |s2cid=206775608}}</ref><ref name="FUKU1980">{{cite journal |last1=Fukushima |first1=K. |year=1980 |title=Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position |journal=Biol. Cybern. |volume=36 |issue=4 |pages=193–202 |doi=10.1007/bf00344251 |pmid=7370364 |s2cid=206775608}}</ref><ref name="SCHIDHUB4"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)