Editing Stochastic gradient descent (section)

==Notable applications==
Stochastic gradient descent is a popular algorithm for training a wide range of models in [[machine learning]], including (linear) [[support vector machine]]s, [[logistic regression]] (see, e.g., [[Vowpal Wabbit]]) and [[graphical model]]s.<ref>Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning (2008). [http://www.aclweb.org/anthology/P08-1109 Efficient, Feature-based, Conditional Random Field Parsing]. Proc. Annual Meeting of the ACL.</ref> When combined with the [[backpropagation|back propagation]] algorithm, it is the ''de facto'' standard algorithm for training [[artificial neural network]]s.<ref>[http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf LeCun, Yann A., et al. "Efficient backprop." Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012. 9-48]</ref> Its use has been also reported in the [[Geophysics]] community, specifically to applications of Full Waveform Inversion (FWI).<ref>[https://library.seg.org/doi/abs/10.1190/1.3230502 Jerome R. Krebs, John E. Anderson, David Hinkley, Ramesh Neelamani, Sunwoong Lee, Anatoly Baumstein, and Martin-Daniel Lacasse, (2009), "Fast full-wavefield seismic inversion using encoded sources," GEOPHYSICS 74: WCC177-WCC188.]</ref>

Stochastic gradient descent competes with the [[limited-memory BFGS|L-BFGS]] algorithm,{{Citation needed|date=July 2015}} which is also widely used. Stochastic gradient descent has been used since at least 1960 for training [[linear regression]] models, originally under the name [[ADALINE]].<ref>{{cite web |author=Avi Pfeffer |title=CS181 Lecture 5 — Perceptrons |url=http://www.seas.harvard.edu/courses/cs181/files/lecture05-notes.pdf |publisher=Harvard University }}{{Dead link|date=June 2018 |bot=InternetArchiveBot |fix-attempted=no }}</ref>

Another stochastic gradient descent algorithm is the [[Least mean squares filter|least mean squares (LMS)]] adaptive filter.