Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Q-learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Discount factor === The discount factor {{tmath|\gamma}} determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, i.e. <math>r_t</math> (in the update rule above), while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the action values may diverge. For {{tmath|\gamma {{=}} 1}}, without a terminal state, or if the agent never reaches one, all environment histories become infinitely long, and utilities with additive, undiscounted rewards generally become infinite.<ref>{{Cite book |title=Artificial Intelligence: A Modern Approach |last1=Russell |first1=Stuart J. |last2=Norvig |first2=Peter |date=2010 |publisher=[[Prentice Hall]] |isbn=978-0136042594 |edition=Third |page=649 |author-link=Stuart J. Russell |author-link2=Peter Norvig}}</ref> Even with a discount factor only slightly lower than 1, ''Q''-function learning leads to propagation of errors and instabilities when the value function is approximated with an [[artificial neural network]].<ref>{{cite journal|first=Leemon |last=Baird |title=Residual algorithms: Reinforcement learning with function approximation |url=http://www.leemon.com/papers/1995b.pdf |journal=ICML |pages= 30–37 |year=1995}}</ref> In that case, starting with a lower discount factor and increasing it towards its final value accelerates learning.<ref>{{cite arXiv|last1=François-Lavet|first1=Vincent|last2=Fonteneau|first2=Raphael|last3=Ernst|first3=Damien|date=2015-12-07|title=How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies|eprint=1512.02011 |class=cs.LG}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)