Editing Q-learning (section)

=== Learning rate ===
The [[learning rate]] or ''step size'' determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities). In fully [[Deterministic system|deterministic]] environments, a learning rate of <math>\alpha_t = 1</math> is optimal. When the problem is [[Stochastic systems|stochastic]], the algorithm converges under some technical conditions on the learning rate that require it to decrease to zero. In practice, often a constant learning rate is used, such as <math>\alpha_t = 0.1</math> for all <math>t</math>.<ref>{{Cite book |url=http://incompleteideas.net/sutton/book/ebook/the-book.html |title=Reinforcement Learning: An Introduction |last1=Sutton |first1=Richard |last2=Barto |first2=Andrew |date=1998 |publisher=MIT Press}}</ref>