Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Stochastic gradient descent
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Approximations in continuous time == For small learning rate <math display="inline">\eta</math> stochastic gradient descent <math display="inline">(w_n)_{n \in \N_0}</math> can be viewed as a discretization of the [[gradient flow]] ODE <math display="block">\frac{d}{dt} W_t = -\nabla Q(W_t)</math> subject to additional stochastic noise. This approximation is only valid on a finite time-horizon in the following sense: assume that all the coefficients <math display="inline">Q_i </math> are sufficiently smooth. Let <math display="inline">T >0 </math> and <math display="inline">g: \R^d \to \R </math> be a sufficiently smooth test function. Then, there exists a constant <math display="inline">C>0 </math> such that for all <math display="inline">\eta >0 </math> <math display="block">\max_{k=0, \dots, \lfloor T/\eta \rfloor } \left|\mathbb E[g(w_k)]-g(W_{k \eta})\right| \le C \eta,</math> where <math display="inline">\mathbb E </math> denotes taking the expectation with respect to the random choice of indices in the stochastic gradient descent scheme. Since this approximation does not capture the random fluctuations around the mean behavior of stochastic gradient descent solutions to [[stochastic differential equations]] (SDEs) have been proposed as limiting objects.<ref>{{Cite journal |last1=Li |first1=Qianxiao |last2=Tai |first2=Cheng |last3=E |first3=Weinan |date=2019 |title=Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations |url=http://jmlr.org/papers/v20/17-526.html |journal=Journal of Machine Learning Research |volume=20 |issue=40 |pages=1β47 |arxiv=1811.01558 |issn=1533-7928}}</ref> More precisely, the solution to the SDE <math display="block">d W_t = - \nabla \left(Q(W_t)+\tfrac 1 4 \eta |\nabla Q(W_t)|^2\right)dt + \sqrt \eta \Sigma (W_t)^{1/2} dB_t,</math> for <math display="block">\Sigma(w) = \frac{1}{n^2} \left(\sum_{i=1}^n Q_i(w)-Q(w)\right)\left(\sum_{i=1}^n Q_i(w)-Q(w)\right)^T </math> where <math display="inline">dB_t </math> denotes the [[Ito integral|Ito-integral]] with respect to a [[Brownian motion]] is a more precise approximation in the sense that there exists a constant <math display="inline">C>0 </math> such that <math display="block">\max_{k=0, \dots, \lfloor T/\eta \rfloor } \left|\mathbb E[g(w_k)]-\mathbb E [g(W_{k \eta})]\right| \le C \eta^2.</math> However this SDE only approximates the one-point motion of stochastic gradient descent. For an approximation of the [[Flow (mathematics)|stochastic flow]] one has to consider SDEs with infinite-dimensional noise.<ref> {{Cite arXiv |eprint=2302.07125 |class=math.PR |first1=Benjamin |last1=Gess |first2=Sebastian |last2=Kassing |title=Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent |date=14 February 2023 |last3=Konarovskyi |first3=Vitalii}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)