Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Markov decision process
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Mathematical model for sequential decision making under uncertainty}} '''Markov decision process''' ('''MDP'''), also called a [[Stochastic dynamic programming|stochastic dynamic program]] or stochastic control problem, is a model for [[sequential decision making]] when [[Outcome (probability)|outcomes]] are uncertain.<ref>{{Cite book |last=Puterman |first=Martin L. |title=Markov decision processes: discrete stochastic dynamic programming |date=1994 |publisher=Wiley |isbn=978-0-471-61977-2 |series=Wiley series in probability and mathematical statistics. Applied probability and statistics section |location=New York}}</ref> Originating from [[operations research]] in the 1950s,<ref>{{Cite book |last1=Schneider |first1=S. |last2=Wagner |first2=D. H. |chapter=Error detection in redundant systems |date=1957-02-26 |title=Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability on - IRE-AIEE-ACM '57 (Western) |chapter-url=https://dl.acm.org/doi/10.1145/1455567.1455587 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=115β121 |doi=10.1145/1455567.1455587 |isbn=978-1-4503-7861-1}}</ref><ref>{{Cite journal |last=Bellman |first=Richard |date=1958-09-01 |title=Dynamic programming and stochastic control processes |url=https://linkinghub.elsevier.com/retrieve/pii/S0019995858800030 |journal=Information and Control |volume=1 |issue=3 |pages=228β239 |doi=10.1016/S0019-9958(58)80003-0 |issn=0019-9958|url-access=subscription }}</ref> MDPs have since gained recognition in a variety of fields, including [[ecology]], [[economics]], [[Health care|healthcare]], [[telecommunications]] and [[reinforcement learning]].<ref name=":0">{{Cite book |last1=Sutton |first1=Richard S. |title=Reinforcement learning: an introduction |last2=Barto |first2=Andrew G. |date=2018 |publisher=The MIT Press |isbn=978-0-262-03924-6 |edition=2nd |series=Adaptive computation and machine learning series |location=Cambridge, Massachusetts}}</ref> Reinforcement learning utilizes the MDP framework to model the interaction between a learning agent and its environment. In this framework, the interaction is characterized by states, actions, and rewards. The MDP framework is designed to provide a simplified representation of key elements of [[artificial intelligence]] challenges. These elements encompass the understanding of [[Causality|cause and effect]], the management of uncertainty and nondeterminism, and the pursuit of explicit goals.<ref name=":0" /> The name comes from its connection to [[Markov chain|Markov chains]], a concept developed by the Russian mathematician [[Andrey Markov]]. The "Markov" in "Markov decision process" refers to the underlying structure of [[Transition system|state transitions]] that still follow the [[Markov property]]. The process is called a "decision process" because it involves making decisions that influence these state transitions, extending the concept of a Markov chain into the realm of decision-making under uncertainty.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)