Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Reinforcement learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Field of machine learning}} {{For|reinforcement learning in psychology|Reinforcement|Operant conditioning}} [[File: Reinforcement learning diagram.svg|thumb|right| The typical framing of a reinforcement learning (RL) scenario: an agent takes actions in an environment, which is interpreted into a reward and a state representation, which are fed back to the agent.]] {{Machine learning|Reinforcement learning}} '''Reinforcement learning''' ('''RL''') is an interdisciplinary area of [[machine learning]] and [[optimal control]] concerned with how an [[intelligent agent]] should [[Action selection|take actions]] in a dynamic environment in order to [[Reward-based selection|maximize a reward]] signal. Reinforcement learning is one of the [[Machine learning#Approaches|three basic machine learning paradigms]], alongside [[supervised learning]] and [[unsupervised learning]]. Reinforcement learning differs from supervised learning in not needing labelled input-output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the cumulative reward (the feedback of which might be incomplete or delayed).<ref name="kaelbling">{{cite journal|last1=Kaelbling|first1=Leslie P.|last2=Littman|first2=Michael L.|author-link2=Michael L. Littman|last3=Moore|first3=Andrew W.|author-link3=Andrew W. Moore|year=1996|title=Reinforcement Learning: A Survey|url=http://www.cs.washington.edu/research/jair/abstracts/kaelbling96a.html|url-status=dead|journal=Journal of Artificial Intelligence Research|volume=4|pages=237–285|doi=10.1613/jair.301|archive-url=http://webarchive.loc.gov/all/20011120234539/http://www.cs.washington.edu/research/jair/abstracts/kaelbling96a.html|archive-date=2001-11-20 |author-link1=Leslie P. Kaelbling|arxiv=cs/9605103|s2cid=1708582}}</ref> The search for this balance is known as the [[exploration–exploitation dilemma]]. The environment is typically stated in the form of a [[Markov decision process]] (MDP), as many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|author1=van Otterlo, M.|author2=Wiering, M.|title=Reinforcement Learning |chapter=Reinforcement Learning and Markov Decision Processes |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> The main difference between classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the Markov decision process, and they target large MDPs where exact methods become infeasible.<ref name="Li-2023">{{cite book |last1=Li |first1=Shengbo |title= Reinforcement Learning for Sequential Decision and Optimal Control |date=2023 |location=Springer Verlag, Singapore |isbn=978-9-811-97783-1 |pages=1–460 |doi=10.1007/978-981-19-7784-8 |s2cid=257928563 |edition=First | url=https://link.springer.com/book/10.1007/978-981-19-7784-8}}</ref> {{toclimit|3}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)