Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Q-learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == ''Q''-learning was introduced by Chris Watkins in 1989.<ref>{{cite thesis|type=Ph.D. thesis|last=Watkins|first=C.J.C.H.|year=1989|title=Learning from Delayed Rewards|publisher=[[University of Cambridge]]|url=http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf|id={{EThOS|uk.bl.ethos.330022}}}}</ref> A convergence proof was presented by Watkins and [[Peter Dayan]] in 1992.<ref>{{cite journal |last1=Watkins |first1=Chris |last2=Dayan |first2=Peter |year=1992 |title=Q-learning |journal=Machine Learning |volume=8 |issue= 3–4|pages=279–292 |doi=10.1007/BF00992698 |doi-access=free |hdl=21.11116/0000-0002-D738-D |hdl-access=free }}</ref> Watkins was addressing “Learning from delayed rewards”, the title of his PhD thesis. Eight years earlier in 1981 the same problem, under the name of “Delayed reinforcement learning”, was solved by Bozinovski's Crossbar Adaptive Array (CAA).<ref name="DobnikarSteele1999">{{cite book|editor-last1=Dobnikar|editor-first1=Andrej|editor-last2=Steele|editor-first2=Nigel C.|editor-last3=Pearson|editor-first3=David W.|editor-first4=Rudolf F. |editor-last4=Albrecht|title=Artificial Neural Nets and Genetic Algorithms: Proceedings of the International Conference in Portorož, Slovenia, 1999|chapter-url={{google books |plainurl=y |id=clKwynlfZYkC|page=320-325}}|date=15 July 1999|publisher=Springer Science & Business Media|isbn=978-3-211-83364-3 |first=S. |last=Bozinovski |chapter=Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem|pages=320–325}}</ref><ref name="Trappl1982">{{cite book|editor-last=Trappl|editor-first=Robert|title=Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research|chapter-url={{google books |plainurl=y |id=mGtQAAAAMAAJ|page=397}}|year=1982|publisher=North Holland|isbn=978-0-444-86488-8|first=S. |last=Bozinovski |chapter=A self learning system using secondary reinforcement|pages=397–402}}</ref> The memory matrix <math>W = \|w(a,s)\|</math> was the same as the eight years later Q-table of Q-learning. The architecture introduced the term “state evaluation” in reinforcement learning. The crossbar learning algorithm, written in mathematical [[pseudocode]] in the paper, in each iteration performs the following computation: * In state {{mvar|s}} perform action {{mvar|a}}; * Receive consequence state {{mvar|s'}}; * Compute state evaluation {{tmath|v(s')}}; * Update crossbar value <math>w'(a,s) = w(a,s) + v(s')</math>. The term “secondary reinforcement” is borrowed from animal learning theory, to model state values via [[backpropagation]]: the state value {{tmath|v(s')}} of the consequence situation is backpropagated to the previously encountered situations. CAA computes state values vertically and actions horizontally (the "crossbar"). Demonstration graphs showing delayed reinforcement learning contained states (desirable, undesirable, and neutral states), which were computed by the state evaluation function. This learning system was a forerunner of the Q-learning algorithm.<ref name="OmidvarElliott1997">{{cite book|editor-last1=Omidvar|editor-first1=Omid|editor-last2=Elliott|editor-first2=David L.|title=Neural Systems for Control|chapter-url={{google books |plainurl=y |id=oLcAiySCow0C}}|date=24 February 1997|publisher=Elsevier|isbn=978-0-08-053739-9|first=A. |last=Barto |chapter=Reinforcement learning}}</ref> In 2014, [[Google DeepMind]] patented<ref>{{cite web|url=https://patentimages.storage.googleapis.com/71/91/4a/c5cf4ffa56f705/US20150100530A1.pdf|title=Methods and Apparatus for Reinforcement Learning, US Patent #20150100530A1|publisher=US Patent Office|date=9 April 2015|access-date=28 July 2018}}</ref> an application of Q-learning to [[deep learning]], titled "deep reinforcement learning" or "deep Q-learning" that can play [[Atari 2600]] games at expert human levels.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)