Editing Q-learning (section)

{{Short description|Model-free reinforcement learning algorithm}}
{{Machine learning|Reinforcement learning}}

'''''Q''-learning''' is a [[reinforcement learning]] algorithm that trains an [[Intelligent agent|agent]] to assign values to its possible actions based on its current [[State (computer science)|state]], without requiring a model of the environment ([[Model-free (reinforcement learning)|model-free]]). It can handle problems with [[Stochastic matrix|stochastic transitions]] and rewards without requiring adaptations.<ref name="Li-2023">{{cite book |last1=Li |first1=Shengbo |title= Reinforcement Learning for Sequential Decision and Optimal Control |date=2023 |location=Springer Verlag, Singapore |isbn=978-9-811-97783-1 |pages=1–460 |doi=10.1007/978-981-19-7784-8 |s2cid=257928563 |edition=First | url=https://link.springer.com/book/10.1007/978-981-19-7784-8}}</ref>

For example, in a grid maze, an agent learns to reach an exit worth 10 points. At a junction, Q-learning might assign a higher value to moving right than left if right gets to the exit faster, improving this choice by trying both directions over time.

For any finite [[Markov decision process]], ''Q''-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.<ref name="auto">{{Cite web |last=Melo |first=Francisco S. |title=Convergence of Q-learning: a simple proof |url=http://users.isr.ist.utl.pt/~mtjspaan/readingGroup/ProofQlearning.pdf}}</ref> ''Q''-learning can identify an optimal [[action selection|action-selection]] policy for any given finite Markov decision process, given infinite exploration time and a partly random policy.<ref name="auto" /> 

"Q" refers to the function that the algorithm computes: the expected reward—that is, the ''quality''—of an action taken in a given state.<ref name=":0">{{Cite web |url=http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/ |title=Demystifying Deep Reinforcement Learning |last=Matiisen |first=Tambet |date=December 19, 2015 |website=neuro.cs.ut.ee |publisher=Computational Neuroscience Lab |language=en-US |access-date=2018-04-06}}</ref>