Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Neural network (machine learning)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Deep learning === Between 2009 and 2012, ANNs began winning prizes in image recognition contests, approaching human level performance on various tasks, initially in [[pattern recognition]] and [[handwriting recognition]].<ref>[http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions 2012 Kurzweil AI Interview] {{Webarchive|url=https://web.archive.org/web/20180831075249/http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions |date=31 August 2018 }} with Juergen Schmidhuber on the eight competitions won by his Deep Learning team 2009–2012</ref><ref>{{Cite web|url=http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions|title=How bio-inspired deep learning keeps winning competitions {{!}} KurzweilAI|website=kurzweilai.net|access-date=16 June 2017|archive-url=https://web.archive.org/web/20180831075249/http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions|archive-date=31 August 2018}}</ref> In 2011, a CNN named ''DanNet<ref name=":32">{{Cite journal |last1=Cireşan |first1=Dan Claudiu |last2=Meier |first2=Ueli |last3=Gambardella |first3=Luca Maria |last4=Schmidhuber |first4=Jürgen |date=21 September 2010 |title=Deep, Big, Simple Neural Nets for Handwritten Digit Recognition |journal=Neural Computation |volume=22 |issue=12 |pages=3207–3220 |arxiv=1003.0358 |doi=10.1162/neco_a_00052 |issn=0899-7667 |pmid=20858131 |s2cid=1918673}}</ref>''<ref name=":62">{{Cite journal |last1=Ciresan |first1=D. C. |last2=Meier |first2=U. |last3=Masci |first3=J. |last4=Gambardella |first4=L.M. |last5=Schmidhuber |first5=J. |date=2011 |title=Flexible, High Performance Convolutional Neural Networks for Image Classification |url=http://ijcai.org/papers11/Papers/IJCAI11-210.pdf |url-status=live |journal=International Joint Conference on Artificial Intelligence |doi=10.5591/978-1-57735-516-8/ijcai11-210 |archive-url=https://web.archive.org/web/20140929094040/http://ijcai.org/papers11/Papers/IJCAI11-210.pdf |archive-date=29 September 2014 |access-date=13 June 2017}}</ref> by Dan Ciresan, Ueli Meier, Jonathan Masci, [[Luca Maria Gambardella]], and Jürgen Schmidhuber achieved for the first time superhuman performance in a visual pattern recognition contest, outperforming traditional methods by a factor of 3.<ref name="SCHIDHUB4">{{cite journal |last=Schmidhuber |first=J. |year=2015 |title=Deep Learning in Neural Networks: An Overview |journal=Neural Networks |volume=61 |pages=85–117 |arxiv=1404.7828 |doi=10.1016/j.neunet.2014.09.003 |pmid=25462637 |s2cid=11715509}}</ref> It then won more contests.<ref name=":82">{{Cite book |last1=Ciresan |first1=Dan |url=http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf |title=Advances in Neural Information Processing Systems 25 |last2=Giusti |first2=Alessandro |last3=Gambardella |first3=Luca M. |last4=Schmidhuber |first4=Jürgen |date=2012 |publisher=Curran Associates, Inc. |editor-last=Pereira |editor-first=F. |pages=2843–2851 |access-date=13 June 2017 |editor-last2=Burges |editor-first2=C. J. C. |editor-last3=Bottou |editor-first3=L. |editor-last4=Weinberger |editor-first4=K. Q. |archive-url=https://web.archive.org/web/20170809081713/http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf |archive-date=9 August 2017 |url-status=live}}</ref><ref name="ciresan2013miccai">{{Cite book |last1=Ciresan |first1=D. |title=Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013 |last2=Giusti |first2=A. |last3=Gambardella |first3=L.M. |last4=Schmidhuber |first4=J. |date=2013 |isbn=978-3-642-38708-1 |series=Lecture Notes in Computer Science |volume=7908 |pages=411–418 |chapter=Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks |doi=10.1007/978-3-642-40763-5_51 |pmid=24579167 |issue=Pt 2}}</ref> They also showed how [[Max pooling|max-pooling]] CNNs on GPU improved performance significantly.<ref name=":9">{{Cite book |last1=Ciresan |first1=D. |title=2012 IEEE Conference on Computer Vision and Pattern Recognition |last2=Meier |first2=U. |last3=Schmidhuber |first3=J. |year=2012 |isbn=978-1-4673-1228-8 |pages=3642–3649 |chapter=Multi-column deep neural networks for image classification |doi=10.1109/cvpr.2012.6248110 |arxiv=1202.2745 |s2cid=2161592}}</ref> In October 2012, [[AlexNet]] by [[Alex Krizhevsky]], [[Ilya Sutskever]], and Geoffrey Hinton<ref name="krizhevsky20122">{{cite journal |last1=Krizhevsky |first1=Alex |last2=Sutskever |first2=Ilya |last3=Hinton |first3=Geoffrey |date=2012 |title=ImageNet Classification with Deep Convolutional Neural Networks |url=https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf |url-status=live |journal=NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada |archive-url=https://web.archive.org/web/20170110123024/http://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf |archive-date=10 January 2017 |access-date=24 May 2017}}</ref> won the large-scale [[ImageNet competition]] by a significant margin over shallow machine learning methods. Further incremental improvements included the VGG-16 network by [[Karen Simonyan (scientist)|Karen Simonyan]] and [[Andrew Zisserman]]<ref name="VGG">{{cite arXiv |eprint=1409.1556 |class=cs.CV |first1=Karen |last1=Simonyan |first2=Zisserman |last2=Andrew |title=Very Deep Convolution Networks for Large Scale Image Recognition |year=2014}}</ref> and Google's [[Inceptionv3]].<ref name="szegedy">{{Cite journal |last=Szegedy |first=Christian |date=2015 |title=Going deeper with convolutions |url=https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf |journal=Cvpr2015 |arxiv=1409.4842 |archive-date=30 September 2024 |access-date=7 August 2024 |archive-url=https://web.archive.org/web/20240930225513/https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf |url-status=live }}</ref> In 2012, [[Andrew Ng|Ng]] and [[Jeff Dean (computer scientist)|Dean]] created a network that learned to recognize higher-level concepts, such as cats, only from watching unlabeled images.<ref name="ng2012">{{cite arXiv |eprint=1112.6209 |class=cs.LG |first1=Andrew |last1=Ng |first2=Jeff |last2=Dean |title=Building High-level Features Using Large Scale Unsupervised Learning |year=2012}}</ref> Unsupervised pre-training and increased computing power from [[GPU]]s and [[distributed computing]] allowed the use of larger networks, particularly in image and visual recognition problems, which became known as "deep learning".<ref name=":4" /> [[Radial basis function network|Radial basis function]] and wavelet networks were introduced in 2013. These can be shown to offer best approximation properties and have been applied in [[nonlinear system identification]] and classification applications.<ref name="SAB1" /> [[Generative adversarial network]] (GAN) ([[Ian Goodfellow]] et al., 2014)<ref name="GANnips">{{cite conference |last1=Goodfellow |first1=Ian |last2=Pouget-Abadie |first2=Jean |last3=Mirza |first3=Mehdi |last4=Xu |first4=Bing |last5=Warde-Farley |first5=David |last6=Ozair |first6=Sherjil |last7=Courville |first7=Aaron |last8=Bengio |first8=Yoshua |year=2014 |title=Generative Adversarial Networks |url=https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf |conference=Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014) |pages=2672–2680 |archive-url=https://web.archive.org/web/20191122034612/http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf |archive-date=22 November 2019 |access-date=20 August 2019 |url-status=live}}</ref> became state of the art in generative modeling during 2014–2018 period. The GAN principle was originally published in 1991 by Jürgen Schmidhuber who called it "artificial curiosity": two neural networks contest with each other in the form of a [[zero-sum game]], where one network's gain is the other network's loss.<ref name="curiosity1991">{{cite conference| title = A possibility for implementing curiosity and boredom in model-building neural controllers | last1 = Schmidhuber | first1 = Jürgen | author-link = Jürgen Schmidhuber | date = 1991 | publisher = MIT Press/Bradford Books| book-title = Proc. SAB'1991| pages = 222–227}}</ref><ref name="gancurpm2020">{{Cite journal|last=Schmidhuber|first=Jürgen| author-link = Jürgen Schmidhuber |date=2020|title=Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991)|journal=Neural Networks |language=en|volume=127|pages=58–66|doi=10.1016/j.neunet.2020.04.008 |pmid=32334341 |arxiv=1906.04493 |s2cid=216056336 }}</ref> The first network is a [[generative model]] that models a [[probability distribution]] over output patterns. The second network learns by [[gradient descent]] to predict the reactions of the environment to these patterns. Excellent image quality is achieved by [[Nvidia]]'s [[StyleGAN]] (2018)<ref name="SyncedReview201822">{{Cite web |date=14 December 2018 |title=GAN 2.0: NVIDIA's Hyperrealistic Face Generator |url=https://syncedreview.com/2018/12/14/gan-2-0-nvidias-hyperrealistic-face-generator/ |access-date=3 October 2019 |website=SyncedReview.com |archive-date=12 September 2024 |archive-url=https://web.archive.org/web/20240912080503/https://syncedreview.com/2018/12/14/gan-2-0-nvidias-hyperrealistic-face-generator/ |url-status=live }}</ref> based on the Progressive GAN by Tero Karras et al.<ref name="progressiveGAN201722">{{cite arXiv |eprint=1710.10196 |class=cs.NE |first1=T. |last1=Karras |first2=T. |last2=Aila |title=Progressive Growing of GANs for Improved Quality, Stability, and Variation |date=26 February 2018 |last3=Laine |first3=S. |last4=Lehtinen |first4=J.}}</ref> Here, the GAN generator is grown from small to large scale in a pyramidal fashion. Image generation by GAN reached popular success, and provoked discussions concerning [[Deepfake|deepfakes]].<ref>{{Cite web |title=Prepare, Don't Panic: Synthetic Media and Deepfakes |url=https://lab.witness.org/projects/synthetic-media-and-deep-fakes/ |url-status=live |archive-url=https://web.archive.org/web/20201202231744/https://lab.witness.org/projects/synthetic-media-and-deep-fakes/ |archive-date=2 December 2020 |access-date=25 November 2020 |publisher=witness.org}}</ref> [[Diffusion model|Diffusion models]] (2015)<ref>{{Cite journal |last1=Sohl-Dickstein |first1=Jascha |last2=Weiss |first2=Eric |last3=Maheswaranathan |first3=Niru |last4=Ganguli |first4=Surya |date=1 June 2015 |title=Deep Unsupervised Learning using Nonequilibrium Thermodynamics |url=http://proceedings.mlr.press/v37/sohl-dickstein15.pdf |journal=Proceedings of the 32nd International Conference on Machine Learning |language=en |publisher=PMLR |volume=37 |pages=2256–2265 |arxiv=1503.03585 |archive-date=21 September 2024 |access-date=7 August 2024 |archive-url=https://web.archive.org/web/20240921065319/http://proceedings.mlr.press/v37/sohl-dickstein15.pdf |url-status=live }}</ref> eclipsed GANs in generative modeling since then, with systems such as [[DALL·E 2]] (2022) and [[Stable Diffusion]] (2022). In 2014, the state of the art was training "very deep neural network" with 20 to 30 layers.<ref>{{Citation |last1=Simonyan |first1=Karen |title=Very Deep Convolutional Networks for Large-Scale Image Recognition |date=10 April 2015 |arxiv=1409.1556 |last2=Zisserman |first2=Andrew}}</ref> Stacking too many layers led to a steep reduction in [[Training, validation, and test data sets|training]] accuracy,<ref name="prelu2">{{cite arXiv |eprint=1502.01852 |class=cs.CV |first1=Kaiming |last1=He |first2=Xiangyu |last2=Zhang |title=Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |year=2016}}</ref> known as the "degradation" problem.<ref name="resnet2">{{Cite conference |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |date=10 December 2015 |title=Deep Residual Learning for Image Recognition |arxiv=1512.03385}}</ref> In 2015, two techniques were developed to train very deep networks: the [[highway network]] was published in May 2015,<ref name="highway20153">{{cite arXiv |eprint=1505.00387 |class=cs.LG |first1=Rupesh Kumar |last1=Srivastava |first2=Klaus |last2=Greff |title=Highway Networks |date=2 May 2015 |last3=Schmidhuber |first3=Jürgen}}</ref> and the residual neural network (ResNet) in December 2015.<ref name="resnet20153">{{Cite conference |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |date=2016 |title=Deep Residual Learning for Image Recognition |url=https://ieeexplore.ieee.org/document/7780459 |location=Las Vegas, NV, USA |publisher=IEEE |pages=770–778 |arxiv=1512.03385 |doi=10.1109/CVPR.2016.90 |isbn=978-1-4673-8851-1 |journal=2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) |access-date=15 April 2023 |archive-date=7 October 2024 |archive-url=https://web.archive.org/web/20241007202422/https://ieeexplore.ieee.org/document/7780459 |url-status=live }}</ref><ref>{{Cite web |last=Linn |first=Allison |date=10 December 2015 |title=Microsoft researchers win ImageNet computer vision challenge |url=https://blogs.microsoft.com/ai/microsoft-researchers-win-imagenet-computer-vision-challenge/ |access-date=29 June 2024 |website=The AI Blog |language=en-US |archive-date=21 May 2023 |archive-url=https://archive.today/20230521191955/https://blogs.microsoft.com/ai/microsoft-researchers-win-imagenet-computer-vision-challenge/ |url-status=live }}</ref> ResNet behaves like an open-gated Highway Net. {{Main|Transformer (deep learning architecture)#History}} During the 2010s, the [[seq2seq]] model was developed, and attention mechanisms were added. It led to the modern Transformer architecture in 2017 in ''[[Attention Is All You Need]]''.<ref name="vaswani2017">{{cite arXiv |eprint=1706.03762 |class=cs.CL |first1=Ashish |last1=Vaswani |first2=Noam |last2=Shazeer |title=Attention Is All You Need |date=12 June 2017 |last8=Polosukhin |first8=Illia |last7=Kaiser |first7=Lukasz |last6=Gomez |first6=Aidan N. |last5=Jones |first5=Llion |last4=Uszkoreit |first4=Jakob |last3=Parmar |first3=Niki}}</ref> It requires computation time that is quadratic in the size of the context window. Jürgen Schmidhuber's fast weight controller (1992)<ref name="transform19922">{{cite journal |last1=Schmidhuber |first1=Jürgen |author-link1=Jürgen Schmidhuber |date=1992 |title=Learning to control fast-weight memories: an alternative to recurrent nets. |url=https://archive.org/download/wikipedia-scholarly-sources-corpus/10.1162.zip/10.1162%252Fneco.1992.4.1.131.pdf |journal=Neural Computation |volume=4 |issue=1 |pages=131–139 |doi=10.1162/neco.1992.4.1.131 |s2cid=16683347}}</ref> scales linearly and was later shown to be equivalent to the unnormalized linear Transformer.<ref name="fastlinear20202">{{cite conference |last1=Katharopoulos |first1=Angelos |last2=Vyas |first2=Apoorv |last3=Pappas |first3=Nikolaos |last4=Fleuret |first4=François |date=2020 |title=Transformers are RNNs: Fast autoregressive Transformers with linear attention |url=https://paperswithcode.com/paper/a-decomposable-attention-model-for-natural |publisher=PMLR |pages=5156–5165 |book-title=ICML 2020 |access-date=21 September 2024 |archive-date=11 July 2023 |archive-url=https://web.archive.org/web/20230711021546/https://paperswithcode.com/paper/a-decomposable-attention-model-for-natural |url-status=live }}</ref><ref name="schlag20212">{{cite conference |last1=Schlag |first1=Imanol |last2=Irie |first2=Kazuki |last3=Schmidhuber |first3=Jürgen |author-link3=Juergen Schmidhuber |date=2021 |title=Linear Transformers Are Secretly Fast Weight Programmers |publisher=Springer |pages=9355–9366 |book-title=ICML 2021}}</ref><ref name="DLhistory" /> Transformers have increasingly become the model of choice for [[natural language processing]].<ref name="wolf2020">{{cite book |last1=Wolf |first1=Thomas |title=Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations |last2=Debut |first2=Lysandre |last3=Sanh |first3=Victor |last4=Chaumond |first4=Julien |last5=Delangue |first5=Clement |last6=Moi |first6=Anthony |last7=Cistac |first7=Pierric |last8=Rault |first8=Tim |last9=Louf |first9=Remi |year=2020 |pages=38–45 |chapter=Transformers: State-of-the-Art Natural Language Processing |doi=10.18653/v1/2020.emnlp-demos.6 |last10=Funtowicz |first10=Morgan |last11=Davison |first11=Joe |last12=Shleifer |first12=Sam |last13=von Platen |first13=Patrick |last14=Ma |first14=Clara |last15=Jernite |first15=Yacine |last16=Plu |first16=Julien |last17=Xu |first17=Canwen |last18=Le Scao |first18=Teven |last19=Gugger |first19=Sylvain |last20=Drame |first20=Mariama |last21=Lhoest |first21=Quentin |last22=Rush |first22=Alexander |s2cid=208117506}}</ref> Many modern [[large language model]]s such as [[ChatGPT]], [[GPT-4]], and [[BERT (language model)|BERT]] use this architecture.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)