Editing Stochastic gradient descent (section)

===Averaging===
''Averaged stochastic gradient descent'', invented independently by Ruppert and Polyak in the late 1980s, is ordinary stochastic gradient descent that records an average of its parameter vector over time. That is, the update is the same as for ordinary stochastic gradient descent, but the algorithm also keeps track of<ref>{{cite journal |last1=Polyak |first1=Boris T. |first2=Anatoli B. |last2=Juditsky |title=Acceleration of stochastic approximation by averaging |journal=SIAM J. Control Optim. |volume=30 |issue=4 |year=1992 |pages=838–855 |url=http://www.meyn.ece.ufl.edu/archive/spm_files/Courses/ECE555-2011/555media/poljud92.pdf |doi=10.1137/0330046 |s2cid=3548228 |access-date=2018-02-14 |archive-date=2016-01-12 |archive-url=https://web.archive.org/web/20160112091615/http://www.meyn.ece.ufl.edu/archive/spm_files/Courses/ECE555-2011/555media/poljud92.pdf |url-status=dead }}</ref>

<math display="block">\bar{w} = \frac{1}{t} \sum_{i=0}^{t-1} w_i.</math>When optimization is done, this averaged parameter vector takes the place of {{mvar|w}}.