Editing Gauss–Newton algorithm (section)

==Large-scale optimization==

For large-scale optimization, the Gauss–Newton method is of special interest because it is often (though certainly not always) true that the matrix <math>\mathbf{J}_\mathbf{r}</math> is more [[Sparse matrix|sparse]] than the approximate Hessian <math>\mathbf{J}_\mathbf{r}^\operatorname{T} \mathbf{J_r}</math>. In such cases, the step calculation itself will typically need to be done with an approximate iterative method appropriate for large and sparse problems, such as the [[conjugate gradient method]].

In order to make this kind of approach work, one needs at least an efficient method for computing the product
<math display="block">{\mathbf{J}_\mathbf{r}}^\operatorname{T} \mathbf{J_r} \mathbf{p}</math>

for some vector '''p'''. With [[sparse matrix]] storage, it is in general practical to store the rows of <math>\mathbf{J}_\mathbf{r}</math> in a compressed form (e.g., without zero entries), making a direct computation of the above product tricky due to the transposition. However, if one defines '''c'''<sub>''i''</sub> as row ''i'' of the matrix <math>\mathbf{J}_\mathbf{r}</math>, the following simple relation holds:
<math display="block">{\mathbf{J}_\mathbf{r}}^\operatorname{T}\mathbf{J_r} \mathbf{p} = \sum_i \mathbf c_i \left(\mathbf c_i \cdot \mathbf{p}\right),</math>

so that every row contributes additively and independently to the product. In addition to respecting a practical sparse storage structure, this expression is well suited for [[Parallel computing|parallel computations]]. Note that every row '''c'''<sub>''i''</sub> is the gradient of the corresponding residual ''r''<sub>''i''</sub>; with this in mind, the formula above emphasizes the fact that residuals contribute to the problem independently of each other.