Editing Neural network (machine learning) (section)

===Learning paradigms===
{{No footnotes|date=August 2019|section}}
Machine learning is commonly separated into three main learning paradigms, [[supervised learning]],<ref>{{cite book |last1=Bernard |first1=Etienne |title=Introduction to machine learning |date=2021 |location=Champaign |publisher=Wolfram Media |isbn=978-1-57955-048-6 |page=9 |url=https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |access-date=22 March 2023 |language=en |archive-date=19 May 2024 |archive-url=https://web.archive.org/web/20240519081126/https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |url-status=live }}</ref> [[unsupervised learning]]<ref>{{cite book |last1=Bernard |first1=Etienne |title=Introduction to machine learning |date=2021 |location=Champaign |publisher=Wolfram Media |isbn=978-1-57955-048-6 |page=12 |url=https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |access-date=22 March 2023 |language=en |archive-date=19 May 2024 |archive-url=https://web.archive.org/web/20240519081126/https://www.wolfram.com/language/introduction-machine-learning/machine-learning-paradigms/#p-9 |url-status=live }}</ref> and [[reinforcement learning]].<ref>{{cite book|url=https://www.wolfram.com/language/introduction-machine-learning/|title=Introduction to Machine Learning|first1=Etienne|publisher=Wolfram Media Inc|year=2021|isbn=978-1-57955-048-6|page=9|last1=Bernard|access-date=28 July 2022|archive-date=19 May 2024|archive-url=https://web.archive.org/web/20240519081126/https://www.wolfram.com/language/introduction-machine-learning/|url-status=live}}</ref> Each corresponds to a particular learning task.

==== Supervised learning ====
[[Supervised learning]] uses a set of paired inputs and desired outputs. The learning task is to produce the desired output for each input. In this case, the cost function is related to eliminating incorrect deductions.<ref>{{Cite journal|last1=Ojha|first1=Varun Kumar|last2=Abraham|first2=Ajith|last3=Snášel|first3=Václav|date=1 April 2017|title=Metaheuristic design of feedforward neural networks: A review of two decades of research|journal=Engineering Applications of Artificial Intelligence|volume=60|pages=97–116|doi=10.1016/j.engappai.2017.01.013|arxiv=1705.05584|bibcode=2017arXiv170505584O|s2cid=27910748}}</ref> A commonly used cost is the [[mean-squared error]], which tries to minimize the average squared error between the network's output and the desired output. Tasks suited for supervised learning are [[pattern recognition]] (also known as classification) and [[Regression analysis|regression]] (also known as function approximation). Supervised learning is also applicable to sequential data (e.g., for handwriting, speech and [[gesture recognition]]). This can be thought of as learning with a "teacher", in the form of a function that provides continuous feedback on the quality of solutions obtained thus far.

====Unsupervised learning====
In [[unsupervised learning]], input data is given along with the cost function, some function of the data <math>\textstyle x</math> and the network's output. The cost function is dependent on the task (the model domain) and any ''[[A priori and a posteriori|a priori]]'' assumptions (the implicit properties of the model, its parameters and the observed variables). As a trivial example, consider the model <math>\textstyle f(x) = a</math> where <math>\textstyle a</math> is a constant and the cost <math>\textstyle C=E[(x - f(x))^2]</math>. Minimizing this cost produces a value of <math>\textstyle a</math> that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in [[Data compression|compression]] it could be related to the [[mutual information]] between <math>\textstyle x</math> and <math>\textstyle f(x)</math>, whereas in statistical modeling, it could be related to the [[posterior probability]] of the model given the data (note that in both of those examples, those quantities would be maximized rather than minimized). Tasks that fall within the paradigm of unsupervised learning are in general [[Approximation|estimation]] problems; the applications include [[Data clustering|clustering]], the estimation of [[statistical distributions]], [[Data compression|compression]] and [[Bayesian spam filtering|filtering]].

====Reinforcement learning====
{{main|Reinforcement learning}}
{{See also|Stochastic control}}

In applications such as playing video games, an actor takes a string of actions, receiving a generally unpredictable response from the environment after each one. The goal is to win the game, i.e., generate the most positive (lowest cost) responses. In [[reinforcement learning]], the aim is to weight the network (devise a policy) to perform actions that minimize long-term (expected cumulative) cost. At each point in time the agent performs an action and the environment generates an observation and an [[instant]]aneous cost, according to some (usually unknown) rules. The rules and the long-term cost usually only can be estimated. At any juncture, the agent decides whether to explore new actions to uncover their costs or to exploit prior learning to proceed more quickly.

Formally, the environment is modeled as a [[Markov decision process]] (MDP) with states <math>\textstyle {s_1,...,s_n}\in S </math> and actions <math>\textstyle {a_1,...,a_m} \in A</math>. Because the state transitions are not known, probability distributions are used instead: the instantaneous cost distribution <math>\textstyle P(c_t|s_t)</math>, the observation distribution <math>\textstyle P(x_t|s_t)</math> and the transition distribution <math>\textstyle P(s_{t+1}|s_t, a_t)</math>, while a policy is defined as the conditional distribution over actions given the observations. Taken together, the two define a [[Markov chain]] (MC). The aim is to discover the lowest-cost MC.

ANNs serve as the learning component in such applications.<ref>{{cite conference | author = Dominic, S. | author2 = Das, R. | author3 = Whitley, D. | author4 = Anderson, C. | date = July 1991 | title = Genetic reinforcement learning for neural networks | pages = 71–76 | conference = IJCNN-91-Seattle International Joint Conference on Neural Networks | book-title = IJCNN-91-Seattle International Joint Conference on Neural Networks | publisher = IEEE | location = Seattle, Washington, US | doi = 10.1109/IJCNN.1991.155315 | isbn = 0-7803-0164-1 | url-access = registration | url = https://archive.org/details/ijcnn91seattlein01ieee }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |doi=10.1016/0098-1354(92)80045-B |issue=4}}</ref> [[Dynamic programming]] coupled with ANNs (giving [[Neural oscillation|neurodynamic]] programming)<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first1=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|isbn=978-1-886529-10-6|page=512|last1=Bertsekas|last2=Tsitsiklis|access-date=17 June 2017|archive-date=29 June 2017|archive-url=https://web.archive.org/web/20170629172039/http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|url-status=live}}</ref> has been applied to problems such as those involved in [[vehicle routing]],<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225 |doi=10.1016/S0305-0548(99)00146-X |issue=11–12|citeseerx=10.1.1.392.4034 }}</ref> video games, [[natural resource management]]<ref>{{cite conference | author = de Rigo, D. | author2 = Rizzoli, A. E. | author3 = Soncini-Sessa, R. | author4 = Weber, E. | author5 = Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | conference = MODSIM 2001, International Congress on Modelling and Simulation | url = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | book-title = Proceedings of MODSIM 2001, International Congress on Modelling and Simulation | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | doi = 10.5281/zenodo.7481 | isbn = 0-86740-525-2 | access-date = 29 July 2013 | archive-date = 7 August 2013 | archive-url = https://web.archive.org/web/20130807223658/http://mssanz.org.au/MODSIM01/MODSIM01.htm | url-status = live }}</ref><ref>{{cite conference| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks |volume=1 |pages=7–14 | conference = 2000 Congress on Evolutionary Computation | book-title = Proceedings of 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, US | doi = 10.1109/CEC.2000.870269 | isbn = 0-7803-6375-2 }}</ref> and [[medicine]]<ref>{{Cite book |last=Deng |first=Geng |author2=Ferris, M.C. |title=Optimization in Medicine |chapter=Neuro-dynamic programming for fractionated radiotherapy planning |year=2008 |volume=12 |pages=47–70 |doi=10.1007/978-0-387-73299-2_3|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications |isbn=978-0-387-73298-5 }}</ref> because of ANNs ability to mitigate losses of accuracy even when reducing the [[discretization]] grid density for numerically approximating the solution of control problems. Tasks that fall within the paradigm of reinforcement learning are control problems, [[game]]s and other sequential decision making tasks.

====Self-learning====

Self-learning in neural networks was introduced in 1982 along with a neural network capable of self-learning named ''crossbar adaptive array'' (CAA).<ref>Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In R. Trappl (ed.) Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North Holland. pp. 397–402. {{ISBN|978-0-444-86488-8}}.</ref> It is a system with only one input, situation s, and only one output, action (or behavior) a. It has neither external advice input nor external reinforcement input from the environment. The CAA computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about encountered situations. The system is driven by the interaction between cognition and emotion.<ref>Bozinovski, S. (2014) "[https://core.ac.uk/download/pdf/81973924.pdf Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981] {{Webarchive|url=https://web.archive.org/web/20190323204838/https://core.ac.uk/download/pdf/81973924.pdf |date=23 March 2019 }}." Procedia Computer Science p. 255-263</ref> Given the memory matrix, W =||w(a,s)||, the crossbar self-learning algorithm in each iteration performs the following computation:
  In situation s perform action a;
  Receive consequence situation s';
  Compute emotion of being in consequence situation v(s');
  Update crossbar memory w'(a,s) = w(a,s) + v(s').

The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is behavioral environment where it behaves, and the other is genetic environment, where from it receives initial emotions (only once) about to be encountered situations in the behavioral environment. Having received the genome vector (species vector) from the genetic environment, the CAA will learn a goal-seeking behavior, in the behavioral environment that contains both desirable and undesirable situations.<ref>{{cite journal | last1 = Bozinovski | first1 = Stevo | last2 = Bozinovska | first2 = Liljana | year = 2001 | title = Self-learning agents: A connectionist theory of emotion based on crossbar value judgment | journal = Cybernetics and Systems | volume = 32 | issue = 6| pages = 637–667 | doi = 10.1080/01969720118145 | s2cid = 8944741 }}</ref>

==== Neuroevolution ====
{{Main|Neuroevolution}}

[[Neuroevolution]] can create neural network topologies and weights using [[evolutionary computation]]. It is competitive with sophisticated gradient descent approaches.<ref>{{cite arXiv |last1=Salimans |first1=Tim |title=Evolution Strategies as a Scalable Alternative to Reinforcement Learning |date=7 September 2017 |eprint=1703.03864 |last2=Ho |first2=Jonathan |last3=Chen |first3=Xi |last4=Sidor |first4=Szymon |last5=Sutskever |first5=Ilya|class=stat.ML }}</ref><ref>{{cite arXiv|last1=Such |first1=Felipe Petroski |title=Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning |date=20 April 2018 |eprint=1712.06567 |last2=Madhavan |first2=Vashisht |last3=Conti |first3=Edoardo |last4=Lehman |first4=Joel |last5=Stanley |first5=Kenneth O. |last6=Clune |first6=Jeff|class=cs.NE }}</ref> One advantage of neuroevolution is that it may be less prone to get caught in "dead ends".<ref>{{cite news|date=10 January 2018|title=Artificial intelligence can 'evolve' to solve problems|
work=Science {{!}} AAAS|url=https://www.science.org/content/article/artificial-intelligence-can-evolve-solve-problems|access-date=7 February 2018|archive-date=9 December 2021|archive-url=https://web.archive.org/web/20211209231714/https://www.science.org/content/article/artificial-intelligence-can-evolve-solve-problems|url-status=live}}</ref>