Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Reinforcement learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Model-based algorithms === Finally, all of the above methods can be combined with algorithms that first learn a model of the [[Markov decision process]], the probability of each next state given an action taken from an existing state. For instance, the Dyna algorithm learns a model from experience, and uses that to provide more modelled transitions for a value function, in addition to the real transitions.<ref>{{Cite conference |last1=Sutton |first1=Richard| title=Integrated Architectures for Learning, Planning and Reacting based on Dynamic Programming |year=1990 |book-title=Machine Learning: Proceedings of the Seventh International Workshop}}</ref> Such methods can sometimes be extended to use of non-parametric models, such as when the transitions are simply stored and "replayed" to the learning algorithm.<ref>{{Cite conference | first1 = Long-Ji | last1 = Lin | title = Self-improving reactive agents based on reinforcement learning, planning and teaching | book-title = Machine Learning volume 8 | year = 1992 | doi = 10.1007/BF00992699 |url=https://link.springer.com/content/pdf/10.1007/BF00992699.pdf}}</ref> Model-based methods can be more computationally intensive than model-free approaches, and their utility can be limited by the extent to which the Markov decision process can be learnt.<ref>{{Citation |last=Zou |first=Lan |title=Chapter 7 - Meta-reinforcement learning |date=2023-01-01 |url=https://www.sciencedirect.com/science/article/pii/B9780323899314000110 |work=Meta-Learning |pages=267β297 |editor-last=Zou |editor-first=Lan |access-date=2023-11-08 |publisher=Academic Press |doi=10.1016/b978-0-323-89931-4.00011-0 |isbn=978-0-323-89931-4}}</ref> There are other ways to use models than to update a value function.<ref>{{Cite conference | last1 = van Hasselt | first1 = Hado | last2 = Hessel | first2 = Matteo | last3 = Aslanides | first3 = John | title = When to use parametric models in reinforcement learning? | year = 2019 | book-title = Advances in Neural Information Processing Systems 32 | url = https://proceedings.neurips.cc/paper/2019/file/1b742ae215adf18b75449c6e272fd92d-Paper.pdf }}</ref> For instance, in [[model predictive control]] the model is used to update the behavior directly.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)