Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Q-learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Deep Q-learning === The DeepMind system used a deep [[convolutional neural network]], with layers of tiled [[convolution]]al filters to mimic the effects of receptive fields. Reinforcement learning is unstable or divergent when a nonlinear function approximator such as a neural network is used to represent Q. This instability comes from the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy of the agent and the data distribution, and the correlations between Q and the target values. The method can be used for stochastic search in various domains and applications.<ref name="Li-2023"/><ref name="MBK">{{Cite journal |author1 = Matzliach B. |author2 = Ben-Gal I. |author3 = Kagan E. |title = Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities| journal=Entropy | year=2022 | volume=24 | issue=8 | page=1168 |url = http://www.eng.tau.ac.il/~bengal/DeepQ_MBK_2023.pdf | doi=10.3390/e24081168 | pmid=36010832 | pmc=9407070 | bibcode=2022Entrp..24.1168M | doi-access=free }}</ref> The technique used ''experience replay,'' a biologically inspired mechanism that uses a random sample of prior actions instead of the most recent action to proceed.<ref name=":0" /> This removes correlations in the observation sequence and smooths changes in the data distribution. Iterative updates adjust Q towards target values that are only periodically updated, further reducing correlations with the target.<ref name="DQN">{{Cite journal |last1=Mnih |first1=Volodymyr |last2=Kavukcuoglu |first2=Koray |last3=Silver |first3=David |last4=Rusu |first4=Andrei A. |last5=Veness |first5=Joel |last6=Bellemare |first6=Marc G. |last7=Graves |first7=Alex |last8=Riedmiller |first8=Martin |last9=Fidjeland |first9=Andreas K. |date=Feb 2015 |title=Human-level control through deep reinforcement learning |journal=Nature |language=en |volume=518 |issue=7540 |pages=529β533 |doi=10.1038/nature14236 |pmid=25719670 |bibcode=2015Natur.518..529M |s2cid=205242740 |issn=0028-0836}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)