Editing Neural network (machine learning) (section)

===Learning===
{{No footnotes|date=August 2019|section}}{{See also|Mathematical optimization|Estimation theory|Machine learning}}

Learning is the adaptation of the network to better handle a task by considering sample observations. Learning involves adjusting the weights (and optional thresholds) of the network to improve the accuracy of the result. This is done by minimizing the observed errors. Learning is complete when examining additional observations does not usefully reduce the error rate. Even after learning, the error rate typically does not reach 0. If after learning, the error rate is too high, the network typically must be redesigned. Practically this is done by defining a [[Loss function|cost function]] that is evaluated periodically during learning. As long as its output continues to decline, learning continues. The cost is frequently defined as a [[statistic]] whose value can only be approximated. The outputs are actually numbers, so when the error is low, the difference between the output (almost certainly a cat) and the correct answer (cat) is small. Learning attempts to reduce the total of the differences across the observations. Most learning models can be viewed as a straightforward application of [[Mathematical optimization|optimization]] theory and [[statistical estimation]].<ref name="Zell1994ch5.2"/><ref>{{Cite book|last1=Kelleher|first1=John D. |last2=Mac Namee|first2=Brian|last3=D'Arcy|first3=Aoife |title=Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies|date=2020|isbn=978-0-262-36110-1 |edition=2nd|location=Cambridge, MA |publisher=The MIT Press |chapter=7-8|oclc=1162184998}}</ref>

==== Learning rate ====
{{main|Learning rate}}
The learning rate defines the size of the corrective steps that the model takes to adjust for errors in each observation.<ref>{{cite arXiv|last=Wei|first=Jiakai|date=26 April 2019|title=Forget the Learning Rate, Decay Loss|class=cs.LG|eprint=1905.00094}}</ref> A high learning rate shortens the training time, but with lower ultimate accuracy, while a lower learning rate takes longer, but with the potential for greater accuracy. Optimizations such as [[Quickprop]] are primarily aimed at speeding up error minimization, while other improvements mainly try to increase reliability. In order to avoid [[oscillation]] inside the network such as alternating connection weights, and to improve the rate of convergence, refinements use an [[adaptive learning rate]] that increases or decreases as appropriate.<ref>{{Cite book|last1=Li|first1=Y.|last2=Fu|first2=Y.|last3=Li|first3=H.|last4=Zhang|first4=S. W.|title=2009 International Conference on Computational Intelligence and Natural Computing |chapter=The Improved Training Algorithm of Back Propagation Neural Network with Self-adaptive Learning Rate |s2cid=10557754|date=1 June 2009|isbn=978-0-7695-3645-3|volume=1|pages=73–76|doi=10.1109/CINC.2009.111}}</ref> The concept of momentum allows the balance between the gradient and the previous change to be weighted such that the weight adjustment depends to some degree on the previous change. A momentum close to 0 emphasizes the gradient, while a value close to 1 emphasizes the last change.{{citation needed|date=October 2024}}

====Cost function====
While it is possible to define a cost function [[ad hoc]], frequently the choice is determined by the function's desirable properties (such as [[Convex function|convexity]]) because it arises from the model (e.g. in a probabilistic model, the model's [[posterior probability]] can be used as an inverse cost).{{citation needed|date=October 2024}}

====Backpropagation====
{{Main|Backpropagation}}
Backpropagation is a method used to adjust the connection weights to compensate for each error found during learning. The error amount is effectively divided among the connections. Technically, backpropagation calculates the [[gradient]] (the derivative) of the [[loss function|cost function]] associated with a given state with respect to the weights. The weight updates can be done via stochastic gradient descent or other methods, such as ''[[extreme learning machine]]s'',<ref>{{cite journal|last1=Huang|first1=Guang-Bin|last2=Zhu |first2=Qin-Yu|last3=Siew|first3=Chee-Kheong|year=2006|title=Extreme learning machine: theory and applications|journal=Neurocomputing|volume=70|issue=1 |pages=489–501|doi=10.1016/j.neucom.2005.12.126 |citeseerx=10.1.1.217.3692|s2cid=116858 }}</ref> "no-prop" networks,<ref>{{cite journal|year=2013|title=The no-prop algorithm: A new learning algorithm for multilayer neural networks |journal=Neural Networks|volume=37 |pages=182–188|doi=10.1016/j.neunet.2012.09.020|pmid=23140797|last1=Widrow|first1=Bernard|display-authors=etal}}</ref> training without backtracking,<ref>{{cite arXiv|eprint=1507.07680|first1=Yann |last1=Ollivier|first2=Guillaume|last2=Charpiat|title=Training recurrent networks without backtracking |year=2015|class=cs.NE}}</ref> "weightless" networks,<ref name="RBMTRAIN">{{Cite journal |last=Hinton |first=G. E. |date=2010 |title=A Practical Guide to Training Restricted Boltzmann Machines |url=https://www.researchgate.net/publication/221166159 |journal=Tech. Rep. UTML TR 2010-003 |access-date=27 June 2017 |archive-date=9 May 2021 |archive-url=https://web.archive.org/web/20210509123211/https://www.researchgate.net/publication/221166159_A_brief_introduction_to_Weightless_Neural_Systems |url-status=live }}</ref><ref>ESANN. 2009.{{full citation needed|date=June 2022}}</ref> and [[Holographic associative memory|non-connectionist neural networks]].{{citation needed|date=June 2022}}