Editing Q-learning (section)

=== Deep Q-learning ===
The DeepMind system used a deep [[convolutional neural network]], with layers of tiled [[convolution]]al filters to mimic the effects of receptive fields. Reinforcement learning is unstable or divergent when a nonlinear function approximator such as a neural network is used to represent Q. This instability comes from the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy of the agent and the data distribution, and the correlations between Q and the target values. The method can be used for stochastic search in various domains and applications.<ref name="Li-2023"/><ref name="MBK">{{Cite journal |author1 = Matzliach B. |author2 = Ben-Gal I. |author3 = Kagan E. |title = Detection of Static and Mobile Targets by an Autonomous Agent with Deep Q-Learning Abilities| journal=Entropy | year=2022 | volume=24 | issue=8 | page=1168 |url =  http://www.eng.tau.ac.il/~bengal/DeepQ_MBK_2023.pdf | doi=10.3390/e24081168 | pmid=36010832 | pmc=9407070 | bibcode=2022Entrp..24.1168M | doi-access=free }}</ref>

The technique used ''experience replay,'' a biologically inspired mechanism that uses a random sample of prior actions instead of the most recent action to proceed.<ref name=":0" /> This removes correlations in the observation sequence and smooths changes in the data distribution. Iterative updates adjust Q towards target values that are only periodically updated, further reducing correlations with the target.<ref name="DQN">{{Cite journal |last1=Mnih |first1=Volodymyr |last2=Kavukcuoglu |first2=Koray |last3=Silver |first3=David |last4=Rusu |first4=Andrei A. |last5=Veness |first5=Joel |last6=Bellemare |first6=Marc G. |last7=Graves |first7=Alex |last8=Riedmiller |first8=Martin |last9=Fidjeland |first9=Andreas K. |date=Feb 2015 |title=Human-level control through deep reinforcement learning |journal=Nature |language=en |volume=518 |issue=7540 |pages=529–533 |doi=10.1038/nature14236 |pmid=25719670 |bibcode=2015Natur.518..529M |s2cid=205242740 |issn=0028-0836}}</ref>