Editing Cholesky decomposition (section)

===Non-linear optimization===
[[Non-linear least squares]] are a particular case of nonlinear optimization. Let <math display=inline>\mathbf{f}(\mathbf{x})=\mathbf{l}</math> be an over-determined system of equations with a non-linear function <math>\mathbf{f}</math> returning vector results. The aim is to minimize square norm of residuals <math display=inline>\mathbf{v}=\mathbf{f}(\mathbf{x})-\mathbf{l}</math>. An approximate [[Newton's method]] solution is obtained by expanding <math>\mathbf{f}</math> into curtailed Taylor series 
<math>\bf f(x_{\rm 0}+\delta x)\approx f(x_{\rm 0})+(\partial f/\partial x)\delta x</math> yielding
linear least squares problem for <math>\bf\delta x</math>
: <math>{\bf(\partial f/\partial x)\delta x=l-f(x_{\rm 0})=v,\;\;\min_{\delta x}=\|v\|^2}.</math>
Of course because of neglect of higher Taylor terms such solution is only approximate, if it ever exists. Now one could update expansion point to <math>\bf x_{\rm n+1}=x_{\rm n}+\delta x</math>
and repeat the whole procedure, hoping that (i) iterations converge to a solution
and (ii) that the solution is the one needed. Unfortunately neither is guaranteed and
must be verified.

[[Non-linear least squares]] may be also applied to the linear least squares problem by setting  
<math>\bf x_{\rm 0}=0</math> and <math>\bf f(x_{\rm 0})=Ax</math>. This may be useful if Cholesky decomposition yields an inaccurate inverse <math>\bf R^{\rm -1}</math> for the triangle matrix where
<math>\bf R^{\rm T}R=N</math>, because of rounding errors. Such a procedure is called a ''differential correction'' of the solution. As long as iterations converge, by virtue of the [[Banach fixed-point theorem]] they yield the solution with a precision that is only limited by the precision of the calculated residuals <math>\bf v=Ax-l</math>. The precision is independent rounding errors in <math>\bf R^{\rm -1}</math>. Poor <math>\bf R^{\rm -1}</math> may restrict region of initial <math>\bf x_{\rm 0}</math>
yielding convergence or altogether preventing it. Usually convergence is slower e.g. linear
so that <math>\bf\|\delta x_{\rm n+1}\|\approx\|=\alpha\delta x_{\rm n}\|</math> where constant
<math>\alpha<1</math>. Such slow convergence may be sped by ''Aitken <math>\delta^2</math>'' method.
If calculation of <math>\bf R^{\rm -1}</math> is very costly, it is possible to use it from previous 
iterations as long as convergence is maintained. Such Cholesky procedure may work even for Hilbert matrices, 
notoriously difficult to invert.<ref>{{cite journal
 | last1 = Schwarzenberg-Czerny | first1 = A.
 | journal = Astronomy and Astrophysics Supplement
 | pages = 405–410
 | title = On matrix factorization and efficient least squares solution
 | volume = 110
 | year = 1995| bibcode = 1995A&AS..110..405S
 }}</ref> 

Non-linear multi-variate functions may be minimized over their parameters using variants of [[Newton's method]] called ''quasi-Newton'' methods.  At iteration k, the search steps in a direction <math display=inline> p_k </math> defined by solving <math display=inline> B_k p_k  =  -g_k </math> for <math display=inline> p_k </math>, where <math display=inline> p_k </math> is the step direction, <math display=inline> g_k </math> is the [[gradient]], and <math display=inline> B_k </math> is an approximation to the [[Hessian matrix]] formed by repeating rank-1 updates at each iteration.  Two well-known update formulas are called [[Davidon–Fletcher–Powell]] (DFP) and [[BFGS method|Broyden–Fletcher–Goldfarb–Shanno]] (BFGS). Loss of the positive-definite condition through round-off error is avoided if rather than updating an approximation to the inverse of the Hessian, one updates the Cholesky decomposition of an approximation of the Hessian matrix itself.<ref>{{Cite book |last=Arora |first=Jasbir Singh |url=https://books.google.com/books?id=9FbwVe577xwC&pg=PA327 |title=Introduction to Optimum Design |date=2004-06-02 |publisher=Elsevier |isbn=978-0-08-047025-2 |language=en}}</ref>