Editing Gauss–Newton algorithm (section)

==Convergence properties==

The Gauss-Newton iteration is guaranteed to converge toward a local minimum point <math>\hat{\beta}</math> under 4 conditions:<ref name = DenSch>{{cite book |last1=J.E. Dennis, Jr. and R.B. Schnabel |title=Numerical Methods for Unconstrained Optimization and Nonlinear Equations |date=1983 |publisher=SIAM 1996 reproduction of Prentice-Hall 1983 edition |page=222}}</ref> The functions <math>r_1,\ldots,r_m</math> are twice continuously differentiable in an open convex set <math>D\ni\hat{\beta}</math>, the Jacobian <math>\mathbf{J}_\mathbf{r}(\hat{\beta})</math> is of full column rank, the initial iterate <math>\beta^{(0)}</math> is near <math>\hat{\beta}</math>, and the local minimum value <math>|S(\hat{\beta})|</math> is small.  The convergence is quadratic if <math>|S(\hat{\beta})|=0</math>.

It can be shown<ref>Björck (1996), p. 260.</ref> that the increment Δ is a [[descent direction]] for {{math|''S''}}, and, if the algorithm converges, then the limit is a [[stationary point]] of {{math|''S''}}. For large minimum value <math>|S(\hat{\beta})|</math>, however, convergence is not guaranteed, not even [[local convergence]] as in [[Newton's method in optimization|Newton's method]], or convergence under the usual Wolfe conditions.<ref>{{citation |title=The divergence of the BFGS and Gauss Newton Methods |last1=Mascarenhas |journal=Mathematical Programming |date=2013 |volume=147 |issue=1 |pages=253–276 |doi=10.1007/s10107-013-0720-6 |arxiv=1309.7922|s2cid=14700106 }}</ref>

The rate of convergence of the Gauss–Newton algorithm can approach [[rate of convergence|quadratic]].<ref>Björck (1996), p. 341, 342.</ref> The algorithm may converge slowly or not at all if the initial guess is far from the minimum or the matrix <math>\mathbf{J_r^\operatorname{T} J_r}</math> is [[ill-conditioned]].  For example, consider the problem with <math>m = 2</math> equations and <math>n = 1</math> variable, given by
<math display="block">\begin{align}
  r_1(\beta) &= \beta + 1, \\
  r_2(\beta) &= \lambda \beta^2 + \beta - 1.
\end{align} </math>

For <math>\lambda < 1</math>, <math>\beta = 0</math> is a local optimum. If <math>\lambda = 0</math>, then the problem is in fact linear and the method finds the optimum in one iteration. If |λ| < 1, then the method converges linearly and the error decreases asymptotically with a factor |λ| at every iteration. However, if |λ| > 1, then the method does not even converge locally.<ref>Fletcher (1987), p. 113.</ref>