Editing Neural network (machine learning) (section)

==Models==
{{Confusing|section|date=April 2017}}{{Further|Mathematics of artificial neural networks}}[[File:Neuron3.png|thumb|right|upright=1.35|Neuron and myelinated axon, with signal flow from inputs at dendrites to outputs at axon terminals]]
ANNs began as an attempt to exploit the architecture of the human brain to perform tasks that conventional algorithms had little success with. They soon reoriented towards improving empirical results, abandoning attempts to remain true to their biological precursors. ANNs have the ability to learn and model non-linearities and complex relationships. This is achieved by neurons being connected in various patterns, allowing the output of some neurons to become the input of others. The network forms a [[Directed graph|directed]], [[weighted graph]].<ref name="Zell1994ch5.2">{{Cite book|title=Simulation neuronaler Netze|last=Zell |first=Andreas|date=2003|publisher=Addison-Wesley|isbn=978-3-89319-554-1|oclc=249017987|trans-title=Simulation of Neural Networks |language=de |edition=1st |chapter=chapter 5.2 }}</ref>

An artificial neural network consists of simulated neurons. Each neuron is connected to other [[Vertex (graph theory)|nodes]] via [[Glossary of graph theory terms#edge|links]] like a biological axon-synapse-dendrite connection. All the nodes connected by links take in some data and use it to perform specific operations and tasks on the data. Each link has a weight, determining the strength of one node's influence on another,<ref name='Winston'>{{cite book |title=Artificial intelligence |publisher=Addison-Wesley Pub. Co |isbn=0-201-53377-4 |edition=3rd|year=1992 }}</ref> allowing weights to choose the signal between neurons.

===Artificial neurons ===
{{main|Artificial neuron}}
ANNs are composed of [[artificial neurons]] which are conceptually derived from biological [[neuron]]s. Each artificial neuron has inputs and produces a single output which can be sent to multiple other neurons.<ref name="Abbod2007">{{cite journal|last1=Abbod|first1=Maysam F.|year=2007|title=Application of Artificial Intelligence to the Management of Urological Cancer|journal=The Journal of Urology|volume=178|issue=4|pages=1150–1156|doi=10.1016/j.juro.2007.05.122|pmid=17698099}}</ref> The inputs can be the feature values of a sample of external data, such as images or documents, or they can be the outputs of other neurons. The outputs of the final ''output neurons'' of the neural net accomplish the task, such as recognizing an object in an image.{{citation needed|date=October 2024}}

To find the output of the neuron we take the weighted sum of all the inputs, weighted by the ''weights'' of the ''connections'' from the inputs to the neuron. We add a ''bias'' term to this sum.<ref name="DAWSON1998">{{cite journal|last1=Dawson|first1=Christian W.|year=1998|title=An artificial neural network approach to rainfall-runoff modelling|journal=Hydrological Sciences Journal|volume=43|issue=1|pages=47–66|doi=10.1080/02626669809492102|bibcode=1998HydSJ..43...47D |doi-access=free}}</ref> This weighted sum is sometimes called the ''activation''. This weighted sum is then passed through a (usually nonlinear) activation function to produce the output. The initial inputs are external data, such as images and documents. The ultimate outputs accomplish the task, such as recognizing an object in an image.<ref>{{Cite web|url=http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn|title=The Machine Learning Dictionary|website=cse.unsw.edu.au|access-date=4 November 2009|archive-url=https://web.archive.org/web/20180826151959/http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn|archive-date=26 August 2018}}</ref>

=== Organization ===
The neurons are typically organized into multiple layers, especially in deep learning. Neurons of one layer connect only to neurons of the immediately preceding and immediately following layers. The layer that receives external data is the ''input layer''. The layer that produces the ultimate result is the ''output layer''. In between them are zero or more ''hidden layers''. Single layer and unlayered networks are also used. Between two layers, multiple connection patterns are possible. They can be 'fully connected', with every neuron in one layer connecting to every neuron in the next layer. They can be ''pooling'', where a group of neurons in one layer connects to a single neuron in the next layer, thereby reducing the number of neurons in that layer.<ref name="flexible">{{cite journal|last=Ciresan|first=Dan|author2=Ueli Meier|author3=Jonathan Masci|author4=Luca M. Gambardella|author5=Jurgen Schmidhuber|year=2011|title=Flexible, High Performance Convolutional Neural Networks for Image Classification|url=https://people.idsia.ch/~juergen/ijcai2011.pdf|journal=Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two|volume=2|pages=1237–1242|access-date=7 July 2022|url-status=live|archive-url=https://web.archive.org/web/20220405190128/https://people.idsia.ch/~juergen/ijcai2011.pdf|archive-date=5 April 2022}}</ref> Neurons with only such connections form a [[directed acyclic graph]] and are known as [[feedforward neural network|''feedforward networks'']].<ref name="Zell1994p73">{{cite book|title=Simulation Neuronaler Netze|last=Zell|first=Andreas|publisher=Addison-Wesley|year=1994|isbn=3-89319-554-8|edition=1st|page=73|language=de|trans-title=Simulation of Neural Networks}}</ref> Alternatively, networks that allow connections between neurons in the same or previous layers are known as [[Recurrent neural network|''recurrent networks'']].<ref>{{Cite journal|last=Miljanovic|first=Milos|date=February–March 2012|title=Comparative analysis of Recurrent and Finite Impulse Response Neural Networks in Time Series Prediction|url=http://www.ijcse.com/docs/INDJCSE12-03-01-028.pdf|journal=Indian Journal of Computer and Engineering|volume=3|issue=1|access-date=21 August 2019|archive-date=19 May 2024|archive-url=https://web.archive.org/web/20240519081156/http://www.ijcse.com/docs/INDJCSE12-03-01-028.pdf|url-status=live}}</ref>

=== Hyperparameter ===
{{Main|Hyperparameter (machine learning)}}
A [[hyperparameter (machine learning)|hyperparameter]] is a constant [[parameter]] whose value is set before the learning process begins. The values of parameters are derived via learning. Examples of hyperparameters include [[learning rate]], the number of hidden layers and batch size.{{cn|date=June 2024}} The values of some hyperparameters can be dependent on those of other hyperparameters. For example, the size of some layers can depend on the overall number of layers.{{citation needed|date=October 2024}}

===Learning===
{{No footnotes|date=August 2019|section}}{{See also|Mathematical optimization|Estimation theory|Machine learning}}

Learning is the adaptation of the network to better handle a task by considering sample observations. Learning involves adjusting the weights (and optional thresholds) of the network to improve the accuracy of the result. This is done by minimizing the observed errors. Learning is complete when examining additional observations does not usefully reduce the error rate. Even after learning, the error rate typically does not reach 0. If after learning, the error rate is too high, the network typically must be redesigned. Practically this is done by defining a [[Loss function|cost function]] that is evaluated periodically during learning. As long as its output continues to decline, learning continues. The cost is frequently defined as a [[statistic]] whose value can only be approximated. The outputs are actually numbers, so when the error is low, the difference between the output (almost certainly a cat) and the correct answer (cat) is small. Learning attempts to reduce the total of the differences across the observations. Most learning models can be viewed as a straightforward application of [[Mathematical optimization|optimization]] theory and [[statistical estimation]].<ref name="Zell1994ch5.2"/><ref>{{Cite book|last1=Kelleher|first1=John D. |last2=Mac Namee|first2=Brian|last3=D'Arcy|first3=Aoife |title=Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies|date=2020|isbn=978-0-262-36110-1 |edition=2nd|location=Cambridge, MA |publisher=The MIT Press |chapter=7-8|oclc=1162184998}}</ref>

==== Learning rate ====
{{main|Learning rate}}
The learning rate defines the size of the corrective steps that the model takes to adjust for errors in each observation.<ref>{{cite arXiv|last=Wei|first=Jiakai|date=26 April 2019|title=Forget the Learning Rate, Decay Loss|class=cs.LG|eprint=1905.00094}}</ref> A high learning rate shortens the training time, but with lower ultimate accuracy, while a lower learning rate takes longer, but with the potential for greater accuracy. Optimizations such as [[Quickprop]] are primarily aimed at speeding up error minimization, while other improvements mainly try to increase reliability. In order to avoid [[oscillation]] inside the network such as alternating connection weights, and to improve the rate of convergence, refinements use an [[adaptive learning rate]] that increases or decreases as appropriate.<ref>{{Cite book|last1=Li|first1=Y.|last2=Fu|first2=Y.|last3=Li|first3=H.|last4=Zhang|first4=S. W.|title=2009 International Conference on Computational Intelligence and Natural Computing |chapter=The Improved Training Algorithm of Back Propagation Neural Network with Self-adaptive Learning Rate |s2cid=10557754|date=1 June 2009|isbn=978-0-7695-3645-3|volume=1|pages=73–76|doi=10.1109/CINC.2009.111}}</ref> The concept of momentum allows the balance between the gradient and the previous change to be weighted such that the weight adjustment depends to some degree on the previous change. A momentum close to 0 emphasizes the gradient, while a value close to 1 emphasizes the last change.{{citation needed|date=October 2024}}

====Cost function====
While it is possible to define a cost function [[ad hoc]], frequently the choice is determined by the function's desirable properties (such as [[Convex function|convexity]]) because it arises from the model (e.g. in a probabilistic model, the model's [[posterior probability]] can be used as an inverse cost).{{citation needed|date=October 2024}}

====Backpropagation====
{{Main|Backpropagation}}
Backpropagation is a method used to adjust the connection weights to compensate for each error found during learning. The error amount is effectively divided among the connections. Technically, backpropagation calculates the [[gradient]] (the derivative) of the [[loss function|cost function]] associated with a given state with respect to the weights. The weight updates can be done via stochastic gradient descent or other methods, such as ''[[extreme learning machine]]s'',<ref>{{cite journal|last1=Huang|first1=Guang-Bin|last2=Zhu |first2=Qin-Yu|last3=Siew|first3=Chee-Kheong|year=2006|title=Extreme learning machine: theory and applications|journal=Neurocomputing|volume=70|issue=1 |pages=489–501|doi=10.1016/j.neucom.2005.12.126 |citeseerx=10.1.1.217.3692|s2cid=116858 }}</ref> "no-prop" networks,<ref>{{cite journal|year=2013|title=The no-prop algorithm: A new learning algorithm for multilayer neural networks |journal=Neural Networks|volume=37 |pages=182–188|doi=10.1016/j.neunet.2012.09.020|pmid=23140797|last1=Widrow|first1=Bernard|display-authors=etal}}</ref> training without backtracking,<ref>{{cite arXiv|eprint=1507.07680|first1=Yann |last1=Ollivier|first2=Guillaume|last2=Charpiat|title=Training recurrent networks without backtracking |year=2015|class=cs.NE}}</ref> "weightless" networks,<ref name="RBMTRAIN">{{Cite journal |last=Hinton |first=G. E. |date=2010 |title=A Practical Guide to Training Restricted Boltzmann Machines |url=https://www.researchgate.net/publication/221166159 |journal=Tech. Rep. UTML TR 2010-003 |access-date=27 June 2017 |archive-date=9 May 2021 |archive-url=https://web.archive.org/web/20210509123211/https://www.researchgate.net/publication/221166159_A_brief_introduction_to_Weightless_Neural_Systems |url-status=live }}</ref><ref>ESANN. 2009.{{full citation needed|date=June 2022}}</ref> and [[Holographic associative memory|non-connectionist neural networks]].{{citation needed|date=June 2022}}

===Learning paradigms===
{{No footnotes|date=August 2019|section}}
Machine learning is commonly separated into three main learning paradigms, [[supervised learning]],<ref>{{cite book |last1=Bernard |first1=Etienne |title=Introduction to machine learning |date=2021 |location=Champaign |publisher=Wolfram Media |isbn=978-1-57955-048-6 |page=9 |url=https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |access-date=22 March 2023 |language=en |archive-date=19 May 2024 |archive-url=https://web.archive.org/web/20240519081126/https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |url-status=live }}</ref> [[unsupervised learning]]<ref>{{cite book |last1=Bernard |first1=Etienne |title=Introduction to machine learning |date=2021 |location=Champaign |publisher=Wolfram Media |isbn=978-1-57955-048-6 |page=12 |url=https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |access-date=22 March 2023 |language=en |archive-date=19 May 2024 |archive-url=https://web.archive.org/web/20240519081126/https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |url-status=live }}</ref> and [[reinforcement learning]].<ref>{{cite book|url=https://www.wolfram.com/language/introduction-machine-learning/|title=Introduction to Machine Learning|first1=Etienne|publisher=Wolfram Media Inc|year=2021|isbn=978-1-57955-048-6|page=9|last1=Bernard|access-date=28 July 2022|archive-date=19 May 2024|archive-url=https://web.archive.org/web/20240519081126/https://www.wolfram.com/language/introduction-machine-learning/|url-status=live}}</ref> Each corresponds to a particular learning task.

==== Supervised learning ====
[[Supervised learning]] uses a set of paired inputs and desired outputs. The learning task is to produce the desired output for each input. In this case, the cost function is related to eliminating incorrect deductions.<ref>{{Cite journal|last1=Ojha|first1=Varun Kumar|last2=Abraham|first2=Ajith|last3=Snášel|first3=Václav|date=1 April 2017|title=Metaheuristic design of feedforward neural networks: A review of two decades of research|journal=Engineering Applications of Artificial Intelligence|volume=60|pages=97–116|doi=10.1016/j.engappai.2017.01.013|arxiv=1705.05584|bibcode=2017arXiv170505584O|s2cid=27910748}}</ref> A commonly used cost is the [[mean-squared error]], which tries to minimize the average squared error between the network's output and the desired output. Tasks suited for supervised learning are [[pattern recognition]] (also known as classification) and [[Regression analysis|regression]] (also known as function approximation). Supervised learning is also applicable to sequential data (e.g., for handwriting, speech and [[gesture recognition]]). This can be thought of as learning with a "teacher", in the form of a function that provides continuous feedback on the quality of solutions obtained thus far.

====Unsupervised learning====
In [[unsupervised learning]], input data is given along with the cost function, some function of the data <math>\textstyle x</math> and the network's output. The cost function is dependent on the task (the model domain) and any ''[[A priori and a posteriori|a priori]]'' assumptions (the implicit properties of the model, its parameters and the observed variables). As a trivial example, consider the model <math>\textstyle f(x) = a</math> where <math>\textstyle a</math> is a constant and the cost <math>\textstyle C=E[(x - f(x))^2]</math>. Minimizing this cost produces a value of <math>\textstyle a</math> that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in [[Data compression|compression]] it could be related to the [[mutual information]] between <math>\textstyle x</math> and <math>\textstyle f(x)</math>, whereas in statistical modeling, it could be related to the [[posterior probability]] of the model given the data (note that in both of those examples, those quantities would be maximized rather than minimized). Tasks that fall within the paradigm of unsupervised learning are in general [[Approximation|estimation]] problems; the applications include [[Data clustering|clustering]], the estimation of [[statistical distributions]], [[Data compression|compression]] and [[Bayesian spam filtering|filtering]].

====Reinforcement learning====
{{main|Reinforcement learning}}
{{See also|Stochastic control}}

In applications such as playing video games, an actor takes a string of actions, receiving a generally unpredictable response from the environment after each one. The goal is to win the game, i.e., generate the most positive (lowest cost) responses. In [[reinforcement learning]], the aim is to weight the network (devise a policy) to perform actions that minimize long-term (expected cumulative) cost. At each point in time the agent performs an action and the environment generates an observation and an [[instant]]aneous cost, according to some (usually unknown) rules. The rules and the long-term cost usually only can be estimated. At any juncture, the agent decides whether to explore new actions to uncover their costs or to exploit prior learning to proceed more quickly.

Formally, the environment is modeled as a [[Markov decision process]] (MDP) with states <math>\textstyle {s_1,...,s_n}\in S </math> and actions <math>\textstyle {a_1,...,a_m} \in A</math>. Because the state transitions are not known, probability distributions are used instead: the instantaneous cost distribution <math>\textstyle P(c_t|s_t)</math>, the observation distribution <math>\textstyle P(x_t|s_t)</math> and the transition distribution <math>\textstyle P(s_{t+1}|s_t, a_t)</math>, while a policy is defined as the conditional distribution over actions given the observations. Taken together, the two define a [[Markov chain]] (MC). The aim is to discover the lowest-cost MC.

ANNs serve as the learning component in such applications.<ref>{{cite conference | author = Dominic, S. | author2 = Das, R. | author3 = Whitley, D. | author4 = Anderson, C. | date = July 1991 | title = Genetic reinforcement learning for neural networks | pages = 71–76 | conference = IJCNN-91-Seattle International Joint Conference on Neural Networks | book-title = IJCNN-91-Seattle International Joint Conference on Neural Networks | publisher = IEEE | location = Seattle, Washington, US | doi = 10.1109/IJCNN.1991.155315 | isbn = 0-7803-0164-1 | url-access = registration | url = https://archive.org/details/ijcnn91seattlein01ieee }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |doi=10.1016/0098-1354(92)80045-B |issue=4}}</ref> [[Dynamic programming]] coupled with ANNs (giving [[Neural oscillation|neurodynamic]] programming)<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first1=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|isbn=978-1-886529-10-6|page=512|last1=Bertsekas|last2=Tsitsiklis|access-date=17 June 2017|archive-date=29 June 2017|archive-url=https://web.archive.org/web/20170629172039/http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|url-status=live}}</ref> has been applied to problems such as those involved in [[vehicle routing]],<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225 |doi=10.1016/S0305-0548(99)00146-X |issue=11–12|citeseerx=10.1.1.392.4034 }}</ref> video games, [[natural resource management]]<ref>{{cite conference | author = de Rigo, D. | author2 = Rizzoli, A. E. | author3 = Soncini-Sessa, R. | author4 = Weber, E. | author5 = Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | conference = MODSIM 2001, International Congress on Modelling and Simulation | url = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | book-title = Proceedings of MODSIM 2001, International Congress on Modelling and Simulation | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | doi = 10.5281/zenodo.7481 | isbn = 0-86740-525-2 | access-date = 29 July 2013 | archive-date = 7 August 2013 | archive-url = https://web.archive.org/web/20130807223658/http://mssanz.org.au/MODSIM01/MODSIM01.htm | url-status = live }}</ref><ref>{{cite conference| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks |volume=1 |pages=7–14 | conference = 2000 Congress on Evolutionary Computation | book-title = Proceedings of 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, US | doi = 10.1109/CEC.2000.870269 | isbn = 0-7803-6375-2 }}</ref> and [[medicine]]<ref>{{Cite book |last=Deng |first=Geng |author2=Ferris, M.C. |title=Optimization in Medicine |chapter=Neuro-dynamic programming for fractionated radiotherapy planning |year=2008 |volume=12 |pages=47–70 |doi=10.1007/978-0-387-73299-2_3|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications |isbn=978-0-387-73298-5 }}</ref> because of ANNs ability to mitigate losses of accuracy even when reducing the [[discretization]] grid density for numerically approximating the solution of control problems. Tasks that fall within the paradigm of reinforcement learning are control problems, [[game]]s and other sequential decision making tasks.

====Self-learning====

Self-learning in neural networks was introduced in 1982 along with a neural network capable of self-learning named ''crossbar adaptive array'' (CAA).<ref>Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In R. Trappl (ed.) Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North Holland. pp. 397–402. {{ISBN|978-0-444-86488-8}}.</ref> It is a system with only one input, situation s, and only one output, action (or behavior) a. It has neither external advice input nor external reinforcement input from the environment. The CAA computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about encountered situations. The system is driven by the interaction between cognition and emotion.<ref>Bozinovski, S. (2014) "[https://core.ac.uk/download/pdf/81973924.pdf Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981] {{Webarchive|url=https://web.archive.org/web/20190323204838/https://core.ac.uk/download/pdf/81973924.pdf |date=23 March 2019 }}." Procedia Computer Science p. 255-263</ref> Given the memory matrix, W =||w(a,s)||, the crossbar self-learning algorithm in each iteration performs the following computation:
  In situation s perform action a;
  Receive consequence situation s';
  Compute emotion of being in consequence situation v(s');
  Update crossbar memory w'(a,s) = w(a,s) + v(s').

The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is behavioral environment where it behaves, and the other is genetic environment, where from it receives initial emotions (only once) about to be encountered situations in the behavioral environment. Having received the genome vector (species vector) from the genetic environment, the CAA will learn a goal-seeking behavior, in the behavioral environment that contains both desirable and undesirable situations.<ref>{{cite journal | last1 = Bozinovski | first1 = Stevo | last2 = Bozinovska | first2 = Liljana | year = 2001 | title = Self-learning agents: A connectionist theory of emotion based on crossbar value judgment | journal = Cybernetics and Systems | volume = 32 | issue = 6| pages = 637–667 | doi = 10.1080/01969720118145 | s2cid = 8944741 }}</ref>

==== Neuroevolution ====
{{Main|Neuroevolution}}

[[Neuroevolution]] can create neural network topologies and weights using [[evolutionary computation]]. It is competitive with sophisticated gradient descent approaches.<ref>{{cite arXiv |last1=Salimans |first1=Tim |title=Evolution Strategies as a Scalable Alternative to Reinforcement Learning |date=7 September 2017 |eprint=1703.03864 |last2=Ho |first2=Jonathan |last3=Chen |first3=Xi |last4=Sidor |first4=Szymon |last5=Sutskever |first5=Ilya|class=stat.ML }}</ref><ref>{{cite arXiv|last1=Such |first1=Felipe Petroski |title=Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning |date=20 April 2018 |eprint=1712.06567 |last2=Madhavan |first2=Vashisht |last3=Conti |first3=Edoardo |last4=Lehman |first4=Joel |last5=Stanley |first5=Kenneth O. |last6=Clune |first6=Jeff|class=cs.NE }}</ref> One advantage of neuroevolution is that it may be less prone to get caught in "dead ends".<ref>{{cite news|date=10 January 2018|title=Artificial intelligence can 'evolve' to solve problems|
work=Science {{!}} AAAS|url=https://www.science.org/content/article/artificial-intelligence-can-evolve-solve-problems|access-date=7 February 2018|archive-date=9 December 2021|archive-url=https://web.archive.org/web/20211209231714/https://www.science.org/content/article/artificial-intelligence-can-evolve-solve-problems|url-status=live}}</ref>

===Stochastic neural network===

'''Stochastic neural networks''' originating from [[Spin glass#Sherrington–Kirkpatrick model|Sherrington–Kirkpatrick model]]s are a type of artificial neural network built by introducing random variations into the network, either by giving the network's artificial neurons [[Stochastic process|stochastic]] transfer functions {{Citation needed|date=September 2024}}, or by giving them stochastic weights. This makes them useful tools for [[Optimization (mathematics)|optimization]] problems, since the random fluctuations help the network escape from [[Maxima and minima|local minima]].<ref>{{citation|title=Stochastic Models of Neural Networks|volume=102|series=Frontiers in artificial intelligence and applications: Knowledge-based intelligent engineering systems|first=Claudio|last=Turchetti|publisher=IOS Press|year=2004|isbn=978-1-58603-388-0}}</ref> Stochastic neural networks trained using a [[Bayes' theorem|Bayesian]] approach are known as '''Bayesian neural networks'''.<ref>{{Cite magazine |last1=Jospin |first1=Laurent Valentin |last2=Laga |first2=Hamid |last3=Boussaid |first3=Farid |last4=Buntine |first4=Wray |last5=Bennamoun |first5=Mohammed |date=2022 |title=Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users |magazine=IEEE Computational Intelligence Magazine |volume=17 |issue=2 |pages=29–48 |doi=10.1109/mci.2022.3155327 |arxiv=2007.06823 |s2cid=220514248 |issn=1556-603X}}</ref>

===Topological deep learning===
[[Topological deep learning]], first introduced in 2017,<ref>{{Cite journal |last1=Cang |first1=Zixuan |last2=Wei |first2=Guo-Wei |date=27 July 2017 |title=TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions |journal=PLOS Computational Biology |language=en |volume=13 |issue=7 |pages=e1005690 |doi=10.1371/journal.pcbi.1005690 |doi-access=free |issn=1553-7358 |pmc=5549771 |pmid=28749969|arxiv=1704.00063 |bibcode=2017PLSCB..13E5690C }}</ref> is an emerging approach in [[machine learning]] that integrates topology with deep neural networks to address highly intricate and high-order data. Initially rooted in [[algebraic topology]], TDL has since evolved into a versatile framework incorporating tools from other mathematical disciplines, such as [[differential topology]] and [[geometric topology]]. As a successful example of mathematical deep learning, TDL continues to inspire advancements in mathematical [[artificial intelligence]], fostering a mutually beneficial relationship between AI and [[mathematics]].  

===Other===
In a [[Bayesian probability|Bayesian]] framework, a distribution over the set of allowed models is chosen to minimize the cost. [[Evolutionary methods]],<ref>{{cite conference |author1=de Rigo, D. |author2=Castelletti, A. |author3=Rizzoli, A. E. |author4=Soncini-Sessa, R. |author5=Weber, E. |date=January 2005 |title=A selective improvement technique for fastening Neuro-Dynamic Programming in Water Resources Network Management |conference=16th IFAC World Congress |publisher=IFAC |location=Prague, Czech Republic |conference-url=http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Index.html |book-title=Proceedings of the 16th IFAC World Congress – IFAC-PapersOnLine |editor=Pavel Zítek |volume=16 |pages=7–12 |url=http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Papers/Paper4269.html |access-date=30 December 2011 |doi=10.3182/20050703-6-CZ-1902.02172 |isbn=978-3-902661-75-3 |hdl=11311/255236 |hdl-access=free |archive-date=26 April 2012 |archive-url=https://web.archive.org/web/20120426012450/http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Papers/Paper4269.html |url-status=live }}</ref> [[gene expression programming]],<ref>{{cite book |last=Ferreira |first=C. |year=2006 |contribution=Designing Neural Networks Using Gene Expression Programming |url=http://www.gene-expression-programming.com/webpapers/Ferreira-ASCT2006.pdf |editor=A. Abraham |editor2=B. de Baets |editor3=M. Köppen |editor4=B. Nickolay |title=Applied Soft Computing Technologies: The Challenge of Complexity |pages=517–536 |publisher=Springer-Verlag |access-date=8 October 2012 |archive-date=19 December 2013 |archive-url=https://web.archive.org/web/20131219022806/http://www.gene-expression-programming.com/webpapers/Ferreira-ASCT2006.pdf |url-status=live }}</ref> [[simulated annealing]],<ref>{{cite conference |author=Da, Y. |author2=Xiurun, G. |date=July 2005 |title=An improved PSO-based ANN with simulated annealing technique |volume=63 |pages=527–533 |editor=T. Villmann |book-title=New Aspects in Neurocomputing: 11th European Symposium on Artificial Neural Networks |url=http://www.dice.ucl.ac.be/esann/proceedings/electronicproceedings.htm |publisher=Elsevier |doi=10.1016/j.neucom.2004.07.002 |access-date=30 December 2011 |archive-date=25 April 2012 |archive-url=https://web.archive.org/web/20120425233611/http://www.dice.ucl.ac.be/esann/proceedings/electronicproceedings.htm |url-status=dead }}</ref> [[expectation–maximization algorithm|expectation–maximization]], [[non-parametric methods]] and [[particle swarm optimization]]<ref>{{cite conference |author=Wu, J. |author2=Chen, E. |date=May 2009 |title=A Novel Nonparametric Regression Ensemble for Rainfall Forecasting Using Particle Swarm Optimization Technique Coupled with Artificial Neural Network |series=Lecture Notes in Computer Science |volume=5553 |pages=49–58 |book-title=6th International Symposium on Neural Networks, ISNN 2009 |url=http://www2.mae.cuhk.edu.hk/~isnn2009/ |editor=Wang, H. |editor2=Shen, Y. |editor3=Huang, T. |editor4=Zeng, Z. |publisher=Springer |doi=10.1007/978-3-642-01513-7_6 |isbn=978-3-642-01215-0 |access-date=1 January 2012 |archive-date=31 December 2014 |archive-url=https://web.archive.org/web/20141231221755/http://www2.mae.cuhk.edu.hk/~isnn2009/ |url-status=dead }}</ref> are other learning algorithms. Convergent recursion is a learning algorithm for [[cerebellar model articulation controller]] (CMAC) neural networks.<ref name="Qin1">{{cite journal |author1=Ting Qin |author2=Zonghai Chen |author3=Haitao Zhang |author4=Sifu Li |author5=Wei Xiang |author6=Ming Li |url=http://www-control.eng.cam.ac.uk/Homepage/papers/cued_control_998.pdf |title=A learning algorithm of CMAC based on RLS |journal=Neural Processing Letters |volume=19 |issue=1 |date=2004 |pages=49–61 |doi=10.1023/B:NEPL.0000016847.18175.60 |s2cid=6233899 |access-date=30 January 2019 |archive-date=14 April 2021 |archive-url=https://web.archive.org/web/20210414103815/http://www-control.eng.cam.ac.uk/Homepage/papers/cued_control_998.pdf |url-status=live }}</ref><ref name="Qin2">{{cite journal |author1=Ting Qin |author2=Haitao Zhang |author3=Zonghai Chen |author4=Wei Xiang |url=http://www-control.eng.cam.ac.uk/Homepage/papers/cued_control_997.pdf |title=Continuous CMAC-QRLS and its systolic array |journal=Neural Processing Letters |volume=22 |issue=1 |date=2005 |pages=1–16 |doi=10.1007/s11063-004-2694-0 |s2cid=16095286 |access-date=30 January 2019 |archive-date=18 November 2018 |archive-url=https://web.archive.org/web/20181118122850/http://www-control.eng.cam.ac.uk/Homepage/papers/cued_control_997.pdf |url-status=live }}</ref>

==== Modes ====
{{No footnotes|date=August 2019|section}}
Two modes of learning are available: stochastic and batch. In stochastic learning, each input creates a weight adjustment. In batch learning, weights are adjusted based on a batch of inputs, accumulating errors over the batch. Stochastic learning introduces "noise" into the process, using the local gradient calculated from one data point; this reduces the chance of the network getting stuck in local minima. However, batch learning typically yields a faster, more stable descent to a local minimum, since each update is performed in the direction of the batch's average error. A common compromise is to use "mini-batches", small batches with samples in each batch selected stochastically from the entire data set.