Editing Least squares (section)

===Non-linear least squares===
{{main|Non-linear least squares}}

There is, in some cases, a [[closed-form solution]] to a non-linear least squares problem – but in general there is not. In the case of no closed-form solution, numerical algorithms are used to find the value of the parameters <math>\beta</math> that minimizes the objective.  Most algorithms involve choosing initial values for the parameters.  Then, the parameters are refined iteratively, that is, the values are obtained by successive approximation:
<math display="block">{\beta_j}^{k+1} = {\beta_j}^k+\Delta \beta_j,</math>
where a superscript ''k'' is an iteration number, and the vector of increments <math>\Delta \beta_j</math> is called the shift vector.  In some commonly used algorithms, at each iteration the model may be linearized by approximation to a first-order [[Taylor series]] expansion about <math> \boldsymbol \beta^k</math>:
<math display="block">\begin{align}
f(x_i,\boldsymbol \beta)
&= f^k(x_i,\boldsymbol \beta) +\sum_j \frac{\partial f(x_i,\boldsymbol \beta)}{\partial \beta_j} \left(\beta_j-{\beta_j}^k \right) \\[1ex]
&= f^k(x_i,\boldsymbol \beta) +\sum_j J_{ij} \,\Delta\beta_j.
\end{align}</math>

The [[Jacobian matrix and determinant|Jacobian]] '''J''' is a function of constants, the independent variable ''and'' the parameters, so it changes from one iteration to the next. The residuals are given by
<math display="block">r_i = y_i - f^k(x_i, \boldsymbol \beta)- \sum_{k=1}^{m} J_{ik}\,\Delta\beta_k = \Delta y_i- \sum_{j=1}^m J_{ij}\,\Delta\beta_j.</math>

To minimize the sum of squares of <math>r_i</math>, the gradient equation is set to zero and solved for <math> \Delta \beta_j</math>:
<math display="block">-2\sum_{i=1}^n J_{ij} \left( \Delta y_i-\sum_{k=1}^m J_{ik} \, \Delta \beta_k \right) = 0,</math>
which, on rearrangement, become ''m'' simultaneous linear equations, the '''normal equations''':
<math display="block">\sum_{i=1}^{n}\sum_{k=1}^m J_{ij} J_{ik} \, \Delta \beta_k=\sum_{i=1}^n J_{ij} \, \Delta y_i \qquad (j=1,\ldots,m).</math>

The normal equations are written in matrix notation as
<math display="block">\left(\mathbf{J}^\mathsf{T} \mathbf{J}\right) \Delta \boldsymbol \beta = \mathbf{J}^\mathsf{T}\Delta \mathbf{y}.</math>
<!-- or <math display="block">\mathbf{\left(J^TWJ\right) \, \Delta \boldsymbol \beta=J^TW \, \Delta y}</math> if weights are used. -->

These are the defining equations of the [[Gauss–Newton algorithm]].