Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Stochastic gradient descent
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===RMSProp=== ''RMSProp'' (for Root Mean Square Propagation) is a method invented in 2012 by James Martens and [[Ilya Sutskever]], at the time both PhD students in Geoffrey Hinton's group, in which the [[learning rate]] is, like in Adagrad, adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight.<ref name=rmsprop>{{Cite web|url=http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf|title=Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude|last=Hinton|first=Geoffrey|author-link=Geoffrey Hinton|pages=26|access-date=19 March 2020}}</ref> Unusually, it was not published in an article but merely described in a [[Coursera]] lecture.{{citation needed|date=June 2023}} Citation 1: https://deepai.org/machine-learning-glossary-and-terms/rmsprop#:~:text=The%20RMSProp%20algorithm%20was%20introduced,its%20effectiveness%20in%20various%20applications. Citation 2: this video at 36:37 https://www.youtube.com/watch?v=-eyhCTvrEtE&t=36m37s So, first the running average is calculated in terms of means square, <math display="block">v(w,t):=\gamma v(w,t-1) + \left(1-\gamma\right) \left(\nabla Q_i(w)\right)^2</math> where, <math>\gamma</math> is the forgetting factor. The concept of storing the historical gradient as sum of squares is borrowed from Adagrad, but "forgetting" is introduced to solve Adagrad's diminishing learning rates in non-convex problems by gradually decreasing the influence of old data.{{cn|date=June 2024}} And the parameters are updated as, <math display="block">w:=w-\frac{\eta}{\sqrt{v(w,t)}}\nabla Q_i(w)</math> RMSProp has shown good adaptation of learning rate in different applications. RMSProp can be seen as a generalization of [[Rprop]] and is capable to work with mini-batches as well opposed to only full-batches.<ref name="rmsprop" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)