Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Reinforcement learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Research == {{More citations needed section|date=October 2022}} Research topics include: * actor-critic architecture<ref>{{Cite journal |last1=Grondman |first1=Ivo |last2=Vaandrager |first2=Maarten |last3=Busoniu |first3=Lucian |last4=Babuska |first4=Robert |last5=Schuitema |first5=Erik |date=2012-06-01 |title=Efficient Model Learning Methods for Actor–Critic Control |url=https://dl.acm.org/doi/10.1109/TSMCB.2011.2170565 |journal= IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)|volume=42 |issue=3 |pages=591–602 |doi=10.1109/TSMCB.2011.2170565 |pmid=22156998 |issn=1083-4419}}</ref> * actor-critic-scenery architecture<ref name="Li-2023" /> * adaptive methods that work with fewer (or no) parameters under a large number of conditions * bug detection in software projects<ref>{{Cite web |title=On the Use of Reinforcement Learning for Testing Game Mechanics : ACM - Computers in Entertainment |url=https://cie.acm.org/articles/use-reinforcements-learning-testing-game-mechanics/ |access-date=2018-11-27 |website=cie.acm.org |language=en}}</ref> * continuous learning * combinations with logic-based frameworks<ref>{{Cite journal|last1=Riveret|first1=Regis|last2=Gao|first2=Yang|date=2019|title=A probabilistic argumentation framework for reinforcement learning agents|journal=Autonomous Agents and Multi-Agent Systems|language=en|volume=33|issue=1–2|pages=216–274|doi=10.1007/s10458-019-09404-2|s2cid=71147890}}</ref> * exploration in large Markov decision processes * entity-based reinforcement learning<ref>{{cite arXiv |title=Entity-Centric Reinforcement Learning for Object Manipulation from Pixels |author1=Haramati, Dan |author2=Daniel, Tal |author3=Tamar, Aviv |eprint=2404.01220 |year=2024|class=cs.RO }}</ref><ref>{{cite conference |last1=Thompson |first1=Isaac Symes |last2=Caron |first2=Alberto |last3=Hicks |first3=Chris |last4=Mavroudis |first4=Vasilios |title=Entity-based Reinforcement Learning for Autonomous Cyber Defence |book-title=Proceedings of the Workshop on Autonomous Cybersecurity (AutonomousCyber '24) |pages=56–67 |date=2024-11-07 |doi=10.1145/3689933.3690835 |publisher=ACM|arxiv=2410.17647 }}</ref><ref>{{cite web |last=Winter |first=Clemens |title=Entity-Based Reinforcement Learning |url=https://clemenswinter.com/2023/04/14/entity-based-reinforcement-learning/ |date=2023-04-14 |website=Clemens Winter's Blog}}</ref> * [[reinforcement learning from human feedback|human feedback]]<ref>{{cite arXiv |last1=Yamagata |first1=Taku |last2=McConville |first2=Ryan |last3=Santos-Rodriguez |first3=Raul |date=2021-11-16 |title=Reinforcement Learning with Feedback from Multiple Humans with Diverse Skills |class=cs.LG |eprint=2111.08596 }}</ref> * interaction between implicit and explicit learning in skill acquisition * [[Intrinsic motivation (artificial intelligence)|intrinsic motivation]] which differentiates information-seeking, curiosity-type behaviours from task-dependent goal-directed behaviours large-scale empirical evaluations * large (or continuous) action spaces * modular and hierarchical reinforcement learning<ref>{{Cite journal|last1=Kulkarni|first1=Tejas D.|last2=Narasimhan|first2=Karthik R.|last3=Saeedi|first3=Ardavan|last4=Tenenbaum|first4=Joshua B.|date=2016|title=Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation|url=http://dl.acm.org/citation.cfm?id=3157382.3157509|journal=Proceedings of the 30th International Conference on Neural Information Processing Systems|series=NIPS'16|location=USA|publisher=Curran Associates Inc.|pages=3682–3690|isbn=978-1-5108-3881-9|bibcode=2016arXiv160406057K|arxiv=1604.06057}}</ref> * multiagent/distributed reinforcement learning is a topic of interest. Applications are expanding.<ref>{{Cite web |title=Reinforcement Learning / Successes of Reinforcement Learning |url=http://umichrl.pbworks.com/Successes-of-Reinforcement-Learning/ |access-date=2017-08-06 |website=umichrl.pbworks.com}}</ref> * occupant-centric control * optimization of computing resources<ref>{{Cite book |last1=Dey |first1=Somdip |last2=Singh |first2=Amit Kumar |last3=Wang |first3=Xiaohang |last4=McDonald-Maier |first4=Klaus |title=2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) |chapter=User Interaction Aware Reinforcement Learning for Power and Thermal Efficiency of CPU-GPU Mobile MPSoCs |date=March 2020 |chapter-url=https://ieeexplore.ieee.org/document/9116294 |pages=1728–1733 |doi=10.23919/DATE48585.2020.9116294 |isbn=978-3-9819263-4-7 |s2cid=219858480|url=http://repository.essex.ac.uk/27546/1/User%20Interaction%20Aware%20Reinforcement%20Learning.pdf }}</ref><ref>{{Cite web |last=Quested |first=Tony |title=Smartphones get smarter with Essex innovation |work=Business Weekly |url=https://www.businessweekly.co.uk/news/academia-research/smartphones-get-smarter-essex-innovation |access-date=2021-06-17}}</ref><ref>{{Cite web |last=Williams |first=Rhiannon |date=2020-07-21 |title=Future smartphones 'will prolong their own battery life by monitoring owners' behaviour' |url=https://inews.co.uk/news/technology/future-smartphones-prolong-battery-life-monitoring-behaviour-558689 |access-date=2021-06-17 |website=[[i (British newspaper)|i]] |language=en}}</ref> * [[Partially observable Markov decision process|partial information]] (e.g., using [[predictive state representation]]) * reward function based on maximising novel information<ref name="kaplan2004">{{cite book |last1=Kaplan |first1=F. |title=Embodied Artificial Intelligence |last2=Oudeyer |first2=P. |chapter=Maximizing Learning Progress: An Internal Reward System for Development |publisher=Springer |year=2004 |isbn=978-3-540-22484-6 |editor-last=Iida |editor-first=F. |series=Lecture Notes in Computer Science |volume=3139 |location=Berlin; Heidelberg |pages=259–270 |doi=10.1007/978-3-540-27833-7_19 |s2cid=9781221 |editor2-last=Pfeifer |editor2-first=R. |editor3-last=Steels |editor3-first=L. |editor4-last=Kuniyoshi |editor4-first=Y.}}</ref><ref name="klyubin2008">{{cite journal |last1=Klyubin |first1=A. |last2=Polani |first2=D. |last3=Nehaniv |first3=C. |year=2008 |title=Keep your options open: an information-based driving principle for sensorimotor systems |journal=PLOS ONE |volume=3 |issue=12 |pages=e4018 |bibcode=2008PLoSO...3.4018K |doi=10.1371/journal.pone.0004018 |pmc=2607028 |pmid=19107219 |doi-access=free}}</ref><ref name="barto2013">{{cite book |last=Barto |first=A. G. |url=https://people.cs.umass.edu/~barto/IMCleVer-chapter-totypeset2.pdf |title=Intrinsically Motivated Learning in Natural and Artificial Systems |publisher=Springer |year=2013 |location=Berlin; Heidelberg |pages=17–47 |chapter=Intrinsic motivation and reinforcement learning}}</ref> * sample-based planning (e.g., based on [[Monte Carlo tree search]]). * securities trading<ref>{{cite journal |last1=Dabérius |first1=Kevin |last2=Granat |first2=Elvin |last3=Karlsson |first3=Patrik |date=2020 |title=Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks |ssrn=3374766 |journal=The Journal of Machine Learning in Finance |volume=1}}</ref> * [[transfer learning]]<ref>{{Cite journal|last1=George Karimpanal|first1=Thommen|last2=Bouffanais|first2=Roland|date=2019|title=Self-organizing maps for storage and transfer of knowledge in reinforcement learning|journal=Adaptive Behavior|language=en|volume=27|issue=2|pages=111–126|doi=10.1177/1059712318818568|issn=1059-7123|arxiv=1811.08318|s2cid=53774629}}</ref> * TD learning modeling [[dopamine]]-based learning in the brain. [[Dopaminergic]] projections from the [[substantia nigra]] to the [[basal ganglia]] function are the prediction error. * value-function and policy search methods
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)