Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Q-learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Learning rate === The [[learning rate]] or ''step size'' determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities). In fully [[Deterministic system|deterministic]] environments, a learning rate of <math>\alpha_t = 1</math> is optimal. When the problem is [[Stochastic systems|stochastic]], the algorithm converges under some technical conditions on the learning rate that require it to decrease to zero. In practice, often a constant learning rate is used, such as <math>\alpha_t = 0.1</math> for all <math>t</math>.<ref>{{Cite book |url=http://incompleteideas.net/sutton/book/ebook/the-book.html |title=Reinforcement Learning: An Introduction |last1=Sutton |first1=Richard |last2=Barto |first2=Andrew |date=1998 |publisher=MIT Press}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)