Editing Neural network (machine learning) (section)

====Reinforcement learning====
{{main|Reinforcement learning}}
{{See also|Stochastic control}}

In applications such as playing video games, an actor takes a string of actions, receiving a generally unpredictable response from the environment after each one. The goal is to win the game, i.e., generate the most positive (lowest cost) responses. In [[reinforcement learning]], the aim is to weight the network (devise a policy) to perform actions that minimize long-term (expected cumulative) cost. At each point in time the agent performs an action and the environment generates an observation and an [[instant]]aneous cost, according to some (usually unknown) rules. The rules and the long-term cost usually only can be estimated. At any juncture, the agent decides whether to explore new actions to uncover their costs or to exploit prior learning to proceed more quickly.

Formally, the environment is modeled as a [[Markov decision process]] (MDP) with states <math>\textstyle {s_1,...,s_n}\in S </math> and actions <math>\textstyle {a_1,...,a_m} \in A</math>. Because the state transitions are not known, probability distributions are used instead: the instantaneous cost distribution <math>\textstyle P(c_t|s_t)</math>, the observation distribution <math>\textstyle P(x_t|s_t)</math> and the transition distribution <math>\textstyle P(s_{t+1}|s_t, a_t)</math>, while a policy is defined as the conditional distribution over actions given the observations. Taken together, the two define a [[Markov chain]] (MC). The aim is to discover the lowest-cost MC.

ANNs serve as the learning component in such applications.<ref>{{cite conference | author = Dominic, S. | author2 = Das, R. | author3 = Whitley, D. | author4 = Anderson, C. | date = July 1991 | title = Genetic reinforcement learning for neural networks | pages = 71–76 | conference = IJCNN-91-Seattle International Joint Conference on Neural Networks | book-title = IJCNN-91-Seattle International Joint Conference on Neural Networks | publisher = IEEE | location = Seattle, Washington, US | doi = 10.1109/IJCNN.1991.155315 | isbn = 0-7803-0164-1 | url-access = registration | url = https://archive.org/details/ijcnn91seattlein01ieee }}</ref><ref>{{cite journal |last=Hoskins |first=J.C. |author2=Himmelblau, D.M. |title=Process control via artificial neural networks and reinforcement learning |journal=Computers & Chemical Engineering |year=1992 |volume=16 |pages=241–251 |doi=10.1016/0098-1354(92)80045-B |issue=4}}</ref> [[Dynamic programming]] coupled with ANNs (giving [[Neural oscillation|neurodynamic]] programming)<ref>{{cite book|url=https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|title=Neuro-dynamic programming|first1=D.P.|first2=J.N.|publisher=Athena Scientific|year=1996|isbn=978-1-886529-10-6|page=512|last1=Bertsekas|last2=Tsitsiklis|access-date=17 June 2017|archive-date=29 June 2017|archive-url=https://web.archive.org/web/20170629172039/http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images|url-status=live}}</ref> has been applied to problems such as those involved in [[vehicle routing]],<ref>{{cite journal |last=Secomandi |first=Nicola |title=Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands |journal=Computers & Operations Research |year=2000 |volume=27 |pages=1201–1225 |doi=10.1016/S0305-0548(99)00146-X |issue=11–12|citeseerx=10.1.1.392.4034 }}</ref> video games, [[natural resource management]]<ref>{{cite conference | author = de Rigo, D. | author2 = Rizzoli, A. E. | author3 = Soncini-Sessa, R. | author4 = Weber, E. | author5 = Zenesi, P. | year = 2001 | title = Neuro-dynamic programming for the efficient management of reservoir networks | conference = MODSIM 2001, International Congress on Modelling and Simulation | url = http://www.mssanz.org.au/MODSIM01/MODSIM01.htm | book-title = Proceedings of MODSIM 2001, International Congress on Modelling and Simulation | publisher = Modelling and Simulation Society of Australia and New Zealand | location = Canberra, Australia | doi = 10.5281/zenodo.7481 | isbn = 0-86740-525-2 | access-date = 29 July 2013 | archive-date = 7 August 2013 | archive-url = https://web.archive.org/web/20130807223658/http://mssanz.org.au/MODSIM01/MODSIM01.htm | url-status = live }}</ref><ref>{{cite conference| author = Damas, M. |author2=Salmeron, M. |author3=Diaz, A. |author4=Ortega, J. |author5=Prieto, A. |author6=Olivares, G.| year = 2000 | title = Genetic algorithms and neuro-dynamic programming: application to water supply networks |volume=1 |pages=7–14 | conference = 2000 Congress on Evolutionary Computation | book-title = Proceedings of 2000 Congress on Evolutionary Computation | publisher = IEEE | location = La Jolla, California, US | doi = 10.1109/CEC.2000.870269 | isbn = 0-7803-6375-2 }}</ref> and [[medicine]]<ref>{{Cite book |last=Deng |first=Geng |author2=Ferris, M.C. |title=Optimization in Medicine |chapter=Neuro-dynamic programming for fractionated radiotherapy planning |year=2008 |volume=12 |pages=47–70 |doi=10.1007/978-0-387-73299-2_3|citeseerx=10.1.1.137.8288 |series=Springer Optimization and Its Applications |isbn=978-0-387-73298-5 }}</ref> because of ANNs ability to mitigate losses of accuracy even when reducing the [[discretization]] grid density for numerically approximating the solution of control problems. Tasks that fall within the paradigm of reinforcement learning are control problems, [[game]]s and other sequential decision making tasks.