Editing Stochastic gradient descent (section)

{{short description|Optimization algorithm}}
{{Machine learning}}
'''Stochastic gradient descent''' (often abbreviated '''SGD''') is an [[Iterative method|iterative]] method for optimizing an [[objective function]] with suitable [[smoothness]] properties (e.g. [[Differentiable function|differentiable]] or [[Subderivative|subdifferentiable]]). It can be regarded as a [[stochastic approximation]] of [[gradient descent]] optimization, since it replaces the actual gradient (calculated from the entire [[data set]]) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in [[high-dimensional]] optimization problems this reduces the very high [[Computational complexity|computational burden]], achieving faster iterations in exchange for a lower [[Rate of convergence|convergence rate]].<ref>{{cite book |first1=Léon |last1=Bottou |author-link=Léon Bottou |first2=Olivier |last2=Bousquet |chapter=The Tradeoffs of Large Scale Learning |title=Optimization for Machine Learning |editor-first=Suvrit |editor-last=Sra |editor2-first=Sebastian |editor2-last=Nowozin |editor3-first=Stephen J. |editor3-last=Wright |location=Cambridge |publisher=MIT Press |year=2012 |isbn=978-0-262-01646-9 |pages=351–368 |chapter-url=https://books.google.com/books?id=JPQx7s2L1A8C&pg=PA351 }}</ref>

The basic idea behind stochastic approximation can be traced back to the [[Robbins–Monro algorithm]] of the 1950s. Today, stochastic gradient descent has become an important optimization method in [[machine learning]].<ref name="Bottou 1998">{{Cite book
  |last=Bottou
  |first=Léon
  |author-link=Léon Bottou
  |contribution=Online Algorithms and Stochastic Approximations
  |year=1998
  |title=Online Learning and Neural Networks
  |publisher=Cambridge University Press
  |url=https://archive.org/details/onlinelearningin0000unse
  |isbn=978-0-521-65263-6
  |url-access=registration
  }}</ref>