Editing Reinforcement learning (section)

== Further reading ==
* {{cite journal |last1=Annaswamy |first1=Anuradha M. |title=Adaptive Control and Intersections with Reinforcement Learning |journal=Annual Review of Control, Robotics, and Autonomous Systems |date=3 May 2023 |volume=6 |issue=1 |pages=65–93 |doi=10.1146/annurev-control-062922-090153 |s2cid=255702873 |language=en |issn=2573-5144|doi-access=free }}
* {{cite journal|last1 = Auer|first1 = Peter|last2 = Jaksch|first2 = Thomas |last3 = Ortner|first3 = Ronald |year = 2010|title = Near-optimal regret bounds for reinforcement learning|url = http://jmlr.csail.mit.edu/papers/v11/jaksch10a.html|journal = Journal of Machine Learning Research|volume = 11|pages = 1563–1600 |author-link1 = Peter Auer}}
* {{cite book|url=http://www.mit.edu/~dimitrib/RLbook.html|last1=Bertsekas |first1=Dimitri P.|title= REINFORCEMENT LEARNING AND OPTIMAL CONTROL |date= 2023 |publisher=Athena Scientific |orig-date=2019 |isbn=978-1-886-52939-7|edition=1st}}
* {{cite book|url = http://www.dcsc.tudelft.nl/rlbook/|title = Reinforcement Learning and Dynamic Programming using Function Approximators|last1 = Busoniu|first1 = Lucian|last2 = Babuska|first2 = Robert|last3 = De Schutter|first3 = Bart|last4 = Ernst|first4 = Damien|publisher = Taylor & Francis CRC Press|year = 2010|isbn = 978-1-4398-2108-4|author-link3 = Bart De Schutter }}
* {{cite journal
| doi = 10.1561/2200000071
| last1 = François-Lavet | first1 = Vincent
| last2 = Henderson | first2 = Peter
| last3 = Islam | first3 = Riashat
| last4 = Bellemare | first4 = Marc G.
| last5 = Pineau | first5 = Joelle
| title = An Introduction to Deep Reinforcement Learning
| journal = Foundations and Trends in Machine Learning
| volume = 11
| issue = 3–4 | pages = 219–354
| year = 2018
| arxiv = 1811.12560 | bibcode = 2018arXiv181112560F | s2cid = 54434537 }}
* {{cite book|url=https://link.springer.com/book/10.1007/978-981-19-7784-8|title=Reinforcement Learning for Sequential Decision and Optimal Control|last1=Li |first1=Shengbo Eben |publisher= Springer Verlag, Singapore |year=2023 |doi=10.1007/978-981-19-7784-8 |isbn=978-9-811-97783-1 |edition=1st }}
* {{cite book |last=Powell |first=Warren |title=Approximate dynamic programming: solving the curses of dimensionality |year=2011 |publisher=Wiley-Interscience |isbn= |url=http://www.castlelab.princeton.edu/adp.htm |access-date=2010-09-08 |archive-date=2016-07-31 |archive-url=https://web.archive.org/web/20160731230325/http://castlelab.princeton.edu/adp.htm |url-status=dead }}
* {{cite journal
| doi = 10.1007/BF00115009
| last = Sutton
| first = Richard S.
| author-link = Richard S. Sutton
| title = Learning to predict by the method of temporal differences
| journal = Machine Learning
| volume = 3
| pages = 9–44
| year = 1988
| doi-access = free
}}
* {{cite book|url=http://incompleteideas.net/sutton/book/the-book.html|title=Reinforcement Learning: An Introduction|last1=Sutton|first1=Richard S.|last2=Barto|first2=Andrew G.|publisher=MIT Press|year=2018 |orig-date=1998 |isbn=978-0-262-03924-6|edition=2nd |author-link1=Richard S. Sutton|author-link2=Andrew Barto}}
* {{cite conference|last1 = Szita|first1 = Istvan|last2 = Szepesvari|first2 = Csaba |year = 2010|title = Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds|url = http://www.icml2010.org/papers/546.pdf|publisher = Omnipress|pages = 1031–1038  |book-title = ICML 2010|url-status = dead|archive-url = https://web.archive.org/web/20100714095438/http://www.icml2010.org/papers/546.pdf|archive-date = 2010-07-14}}