Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Gauss–Newton algorithm
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Improved versions == With the Gauss–Newton method the sum of squares of the residuals ''S'' may not decrease at every iteration. However, since Δ is a descent direction, unless <math>S\left(\boldsymbol \beta^s\right)</math> is a stationary point, it holds that <math>S\left(\boldsymbol \beta^s + \alpha\Delta\right) < S\left(\boldsymbol \beta^s\right)</math> for all sufficiently small <math>\alpha>0</math>. Thus, if divergence occurs, one solution is to employ a fraction <math>\alpha</math> of the increment vector Δ in the updating formula: <math display="block"> \boldsymbol \beta^{s+1} = \boldsymbol \beta^s + \alpha \Delta.</math> In other words, the increment vector is too long, but it still points "downhill", so going just a part of the way will decrease the objective function ''S''. An optimal value for <math>\alpha</math> can be found by using a [[line search]] algorithm, that is, the magnitude of <math>\alpha</math> is determined by finding the value that minimizes ''S'', usually using a [[line search|direct search method]] in the interval <math>0 < \alpha < 1</math> or a [[backtracking line search]] such as [[Backtracking line search|Armijo-line search]]. Typically, <math>\alpha</math> should be chosen such that it satisfies the [[Wolfe conditions]] or the [[Goldstein conditions]].<ref>{{Cite book|title=Numerical optimization|last=Nocedal, Jorge. | date=1999 | publisher=Springer|others=Wright, Stephen J., 1960-|isbn=0387227423|location=New York|oclc=54849297}}</ref> In cases where the direction of the shift vector is such that the optimal fraction α is close to zero, an alternative method for handling divergence is the use of the [[Levenberg–Marquardt algorithm]], a [[trust region]] method.<ref name="ab"/> The normal equations are modified in such a way that the increment vector is rotated towards the direction of [[steepest descent]], <math display="block">\left(\mathbf{J^\operatorname{T} J + \lambda D}\right) \Delta = -\mathbf{J}^\operatorname{T} \mathbf{r},</math> where '''D''' is a positive diagonal matrix. Note that when '''D''' is the identity matrix '''I''' and <math>\lambda \to +\infty</math>, then <math>\lambda \Delta = \lambda \left(\mathbf{J^\operatorname{T} J} + \lambda \mathbf{I}\right)^{-1} \left(-\mathbf{J}^\operatorname{T} \mathbf{r}\right) = \left(\mathbf{I} - \mathbf{J^\operatorname{T} J} / \lambda + \cdots \right) \left(-\mathbf{J}^\operatorname{T} \mathbf{r}\right) \to -\mathbf{J}^\operatorname{T} \mathbf{r}</math>, therefore the [[Direction (geometry, geography)|direction]] of Δ approaches the direction of the negative gradient <math>-\mathbf{J}^\operatorname{T} \mathbf{r}</math>. The so-called Marquardt parameter <math>\lambda</math> may also be optimized by a line search, but this is inefficient, as the shift vector must be recalculated every time <math>\lambda</math> is changed. A more efficient strategy is this: When divergence occurs, increase the Marquardt parameter until there is a decrease in ''S''. Then retain the value from one iteration to the next, but decrease it if possible until a cut-off value is reached, when the Marquardt parameter can be set to zero; the minimization of ''S'' then becomes a standard Gauss–Newton minimization.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)