Editing Newton's method (section)

==Multidimensional formulations==
===Systems of equations===
===={{mvar|k}} variables, {{mvar|k}} functions{{anchor|multidimensional}}====
One may also use Newton's method to solve systems of {{mvar|k}} equations, which amounts to finding the (simultaneous) zeroes of {{mvar|k}} continuously differentiable functions <math>f:\R^k\to \R.</math> This is equivalent to finding the zeroes of a single vector-valued function <math>F:\R^k\to \R^k.</math> In the formulation given above, the scalars {{mvar|x<sub>n</sub>}} are replaced by vectors {{math|'''x'''{{sub|{{var|n}}}}}} and instead of dividing the function {{math|{{var|f}}({{var|x}}{{sub|{{var|n}}}})}} by its derivative {{math|{{var|{{prime|f}}}}({{var|x}}{{sub|{{var|n}}}})}} one instead has to left multiply the function {{math|{{var|F}}('''x'''{{sub|{{var|n}}}})}} by the inverse of its {{math|{{var|k}} × {{var|k}}}} [[Jacobian matrix]] {{math|{{var|J}}{{sub|{{var|F}}}}('''x'''{{sub|{{var|n}}}})}}.<ref name=":3">{{Cite book |last1=Burden |first1=Burton |url=https://archive.org/details/numericalanaly00burd/ |title=Numerical Analysis |last2=Fairs |first2=J. Douglas |last3=Reunolds |first3=Albert C |date=July 1981 |publisher=Prindle, Weber & Schmidt |isbn=0-87150-314-X |edition=2nd |location=Boston, MA, United States |oclc=1036752194 |pages=448–452 |language=en}}</ref><ref>{{Cite book |last= Evans |first=Gwynne A. |url-access= registration|url=https://archive.org/details/practicalnumeric0000evan/ |title=Practical Numerical Analysis |date=1995 |publisher=John Wiley & Sons|isbn=0471955353 |location= Chichester |publication-date=1995 |pages=30–33 |language=en | oclc=1319419671 }}</ref><ref>{{Cite book |last1=Demidovich |first1=Boris Pavlovich |url=https://archive.org/details/computational-mathematics/mode/2up |title=Computational Mathematics |last2=Maron |first2=Isaak Abramovich |date=1981 |publisher=MIR Publishers |isbn=9780828507042 |edition=Third |location=Moscow  |pages=460–478 |language=en}}</ref> This results in the expression

<math display="block">\mathbf{x}_{n+1} = \mathbf{x}_{n} - J_F(\mathbf{x}_n)^{-1} F(\mathbf{x}_n) .</math>

or, by solving the [[system of linear equations]]

<math display="block">J_F(\mathbf{x}_n) (\mathbf{x}_{n+1} - \mathbf{x}_n) = -F(\mathbf{x}_n)</math>

for the unknown {{math|'''x'''{{sub|{{var|n}} + 1}} − '''x'''{{sub|{{var|n}}}}}}.<ref>{{cite book |last1=Kiusalaas |first1=Jaan |title=Numerical Methods in Engineering with Python 3 |date=March 2013 |publisher=Cambridge University Press |location=New York |isbn=978-1-107-03385-6 |pages=175–176 |edition=3rd |url=https://www.cambridge.org/9781107033856}}</ref>

===={{mvar|k}} variables, {{mvar|m}} equations, with {{math|{{var|m}} > {{var|k}}}}====
The {{mvar|k}}-dimensional variant of Newton's method can be used to solve systems of greater than {{mvar|k}} (nonlinear) equations as well if the algorithm uses the [[generalized inverse]] of the non-square [[Jacobian matrix and determinant|Jacobian]] matrix {{math|{{var|J}}{{isup|+}} {{=}} ({{var|J}}{{isup|T}}{{var|J}}){{sup|−1}}{{var|J}}{{isup|T}}}} instead of the inverse of {{mvar|J}}. If the [[system of nonlinear equations|nonlinear system]] has no solution, the method attempts to find a solution in the [[non-linear least squares]] sense. See [[Gauss–Newton algorithm]] for more information.

==== Example ====
For example, the following set of equations needs to be solved for vector of points <math>\ [\ x_1, x_2\ ]\ ,</math> given the vector of known values <math>\ [\ 2, 3\ ] ~.</math>{{refn | This example is similar to one in reference,<ref name=":3" /> pages 451 and 452, but simplified to two equations instead of three.}}

<math>
\begin{array}{lcr} 
5\ x_1^2 + x_1\ x_2^2 + \sin^2( 2\ x_2 ) &= \quad 2  \\ 
e^{ 2\ x_1 - x_2 } + 4\ x_2              &= \quad 3 
\end{array}</math>

the function vector, <math>\ F (X_k)\ ,</math> and Jacobian Matrix, <math>\ J(X_k)\ </math> for iteration k, and the vector of known values, <math>\ Y\ ,</math> are defined below.

<math>\begin{align}
~ & F(X_k) ~ = ~
\begin{bmatrix} 
\begin{align}
~ & f_{1}(X_{k}) \\ 
~ & f_{2}(X_{k}) 
\end{align}
\end{bmatrix} ~ = ~
\begin{bmatrix} 
\begin{align}
~ & 5\ x_{1}^2 + x_{1}\ x^2_{2} + \sin^2( 2\ x_{2} ) \\ 
~ & e^{ 2\ x_{1}-x_{2} } + 4\ x_{2} 
\end{align}
\end{bmatrix}_k
\\
~ & J(X_k) = 
\begin{bmatrix} 
~ \frac{\ \partial{ f_{1}(X) }\ }{ \partial{x_{1}} }\ , & ~ \frac{\ \partial{ f_{1}(X) }\ }{ \partial{x_{2}} } ~\\
~ \frac{\ \partial{ f_{2}(X) }\ }{ \partial{x_{1}} }\ , & ~ \frac{\ \partial{ f_{2}(X) }\ }{ \partial{x_{2}} } ~
\end{bmatrix}_k
~ = ~
\begin{bmatrix} 
\begin{align}
~ & 10\ x_{1} + x^2_{2}\ ,  & & 2\ x_1\ x_2+4\ \sin( 2\ x_{2} )\ \cos( 2\ x_{2} ) \\ 
~ & 2\ e^{ 2\ x_{1} - x_{2} }\ ,  & &-e^{ 2\ x_{1} - x_{2}} + 4
\end{align}
\end{bmatrix}_k
\\
~ & Y = 
 \begin{bmatrix}~ 2 ~\\~ 3 ~\end{bmatrix}

\end{align} </math>

Note that <math>\ F(X_k)\ </math> could have been rewritten to absorb <math>\ Y\ ,</math> and thus eliminate  <math>Y</math> from the equations. The equation to solve for each iteration are

<math>\begin{align}
\begin{bmatrix}
\begin{align}
~ & ~ 10\ x_{1} + x^2_{2 }\ , & & 2 x_1 x_2 + 4\ \sin( 2\ x_{2} )\ \cos( 2\ x_{2} ) ~\\
~ & ~ 2\ e^{ 2\ x_{1} - x_{2} }\ ,  & & -e^{ 2\ x_{1} - x_{2} } + 4 ~
\end{align}
\end{bmatrix}_k

\begin{bmatrix}
~ c_{1} ~\\
~ c_{2} ~
\end{bmatrix}_{k+1}
=
\begin{bmatrix}
~ 5\ x_{1}^2 + x_{1}\ x^2_{2} + \sin^2( 2\ x_{2} ) - 2 ~\\
~ e^{ 2\ x_{1} - x_{2} } + 4\ x_{2} - 3 ~
\end{bmatrix}_k

\end{align}</math>

and

<math> X_{ k+1 } ~=~ X_k - C_{ k+1 } </math>

The iterations should be repeated until <math>\ \Bigg[ \sum_{i=1}^{i=2} \Bigl| f(x_i)_k - (y_i)_k \Bigr|\Bigg] < E\ ,</math> where <math>\ E\ </math> is a value acceptably small enough to meet application requirements.

If vector <math>\ X_0\ </math> is initially chosen to be <math>\ \begin{bmatrix}~ 1 ~&~ 1 ~\end{bmatrix}\ ,</math> that is, <math>\ x_1 = 1\ ,</math> and <math>\ x_2=1\ ,</math> and <math>\ E\ ,</math> is chosen to be 1.{{10^|−3}}, then the example converges after four iterations to a value of <math>\ X_4 = \left[~ 0.567297,\ -0.309442 ~\right] ~.</math>

==== Iterations ====
The following iterations were made during the course of the solution.
:{| class="wikitable"
|+ Converging iteration sequence
|- style="vertical-align:bottom;" 
! Step
! Variable
! {{left|Value}}
|-
|rowspan="2"; align="center";| {{math| 0 }}
|align="right";| {{mvar| x {{=}} }}
| <math> \begin{bmatrix}\ 1\ , & 1 \end{bmatrix} </math>
|-
|align="right";| {{math| ''f''(''x'') {{=}} }}
|<math> \begin{bmatrix}\ 6.82682\ , & 6.71828\ \end{bmatrix} </math>
|-
|colspan="3" ; style="background:white;"| 
|-
|rowspan="4"; align="center";| {{math| 1 }}
|align="right";| {{mvar| J  {{=}} }}
| <math> \begin{bmatrix}\ 11 ~,      & \quad 0.486395 \\
                       \  5.43656\ , & 1.28172 \end{bmatrix} </math>
|-
|align="right";| {{mvar| c {{=}} }}
| <math>\begin{bmatrix}\ 0.382211\ , & 1.27982\ \end{bmatrix} </math>
|-
|align="right";| {{mvar| x {{=}} }}
| <math>\begin{bmatrix}\ 0.617789\ , & -0.279818\ \end{bmatrix} </math>
|-
|align="right";| {{math| ''f''(''x'')  {{=}} }}
| <math>\begin{bmatrix}\ 2.23852\ , & 3.43195\ \end{bmatrix} </math>
|-
|colspan="3" ; style="background:white;"|
|-
|rowspan="4"; align="center";| {{math| 2 }}
|align="right";| {{mvar| J {{=}} }}
| <math>\begin{bmatrix}\ 6.25618\ , & -2.1453 \\
                      \  9.10244\ , &\quad  -0.551218 \end{bmatrix} </math>
|-
|align="right";| {{mvar| c {{=}} }}
| <math>\begin{bmatrix} 0.0494549\ , & 0.0330411\ \end{bmatrix} </math>
|-
|align="right";| {{mvar| x {{=}} }}
| <math>\begin{bmatrix}\ 0.568334\ , & -0.312859\ \end{bmatrix} </math>
|-
|align="right";| {{math| ''f''(''x'')  {{=}} }}
| <math>\begin{bmatrix}\ 2.01366\ , & 3.00966\ \end{bmatrix} </math>
|-
|colspan="3" ; style="background:white;"|
|-
|rowspan="4"; align="center";| {{math| 3 }}
|align="right";| {{mvar| J {{=}} }}
| <math>\begin{bmatrix}\ 5.78122\ , & -2.25449 \\
                      \  8.52219\ , &\quad  -0.261095\ \end{bmatrix} </math>
|-
|align="right";| {{mvar| c {{=}} }}
| <math>\begin{bmatrix} 0.00102862\ , & -0.00342339\ \end{bmatrix} </math>
|-
|align="right";| {{mvar| x {{=}} }}
| <math>\begin{bmatrix}\ 0.567305\ , & -0.309435\ \end{bmatrix} </math>
|-
|align="right";| {{math| ''f''(''x'')  {{=}} }}
| <math>\begin{bmatrix}\ 2.00003\ , & 3.00006\ \end{bmatrix} </math>
|-
|colspan="3" ; style="background:white;"|
|-
|rowspan="4"; align="center";| {{math| 4 }}
|align="right";| {{mvar| J {{=}} }}
| <math>\begin{bmatrix}\ 5.7688~ ,  & ~ -2.24118 \\
                      \  8.47561\ , &\quad -0.237805 \end{bmatrix}\ </math>
|-
|align="right";| {{mvar| c  {{=}} }}
| <math>\begin{bmatrix}\ 7.73132\!\times\!10^{-6} ~, & ~ 6.93265\!\times\!10^{-6}\ \end{bmatrix} </math>
|-
|align="right";| {{mvar| x  {{=}} }}
| <math>\begin{bmatrix}\ 0.567297\ , & -0.309442\ \end{bmatrix} </math>
|-
|align="right";| {{math| ''f''(''x'')  {{=}} }}
| <math>\begin{bmatrix} ~ 2\ ,~ & ~ 3 ~ \end{bmatrix} </math>
|}

===Complex functions===
{{main|Newton fractal}}
[[Image:newtroot 1 0 0 0 0 m1.png|thumb|Basins of attraction for {{math|{{var|x}}{{sup|5}} − 1 {{=}} 0}}; darker means more iterations to converge.]]
When dealing with [[complex analysis|complex functions]], Newton's method can be directly applied to find their zeroes.<ref>{{cite book |last=Henrici|author-link=Peter Henrici (mathematician)|first=Peter |title= Applied and Computational Complex Analysis |volume=1 |date=1974 |publisher=Wiley | isbn =9780471598923  }}</ref> Each zero has a [[basin of attraction]] in the complex plane, the set of all starting values that cause the method to converge to that particular zero. These sets can be mapped as in the image shown. For many complex functions, the boundaries of the basins of attraction are [[fractal]]s.

In some cases there are regions in the complex plane which are not in any of these basins of attraction, meaning the iterates do not converge. For example,<ref>{{cite journal|last=Strang |first=Gilbert |title=A chaotic search for {{mvar|i}} |journal=[[The College Mathematics Journal]] |volume=22 |date=January 1991 |issue=1 |pages=3–12 |doi=10.2307/2686733|jstor=2686733 }}</ref> if one uses a real initial condition to seek a root of {{math|{{var|x}}{{sup|2}} + 1}}, all subsequent iterates will be real numbers and so the iterations cannot converge to either root, since both roots are non-real. In this case [[almost all]] real initial conditions lead to [[chaos theory|chaotic behavior]], while some initial conditions iterate either to infinity or to repeating cycles of any finite length.

Curt McMullen has shown that for any possible purely iterative algorithm similar to Newton's method, the algorithm will diverge on some open regions of the complex plane when applied to some polynomial of degree 4 or higher. However, McMullen gave a generally convergent algorithm for polynomials of degree 3.<ref>{{cite journal|last=McMullen |first=Curt |title=Families of rational maps and iterative root-finding algorithms |journal=Annals of Mathematics |series=Second Series |volume=125 |date=1987 |issue=3 |pages=467–493 |doi=10.2307/1971408|jstor=1971408 |url=https://dash.harvard.edu/bitstream/handle/1/9876064/McMullen_FamiliesRationalMap.pdf?sequence=1 }}</ref> Also, for any polynomial, Hubbard, Schleicher, and Sutherland gave a method for selecting a set of initial points such that  Newton's method will certainly converge at one of them at least.<ref>{{Cite journal |last1=Hubbard |first1=John |last2=Schleicher |first2=Dierk |last3=Sutherland |first3=Scott |date=October 2001 |title=How to find all roots of complex polynomials by Newton's method |url=http://dx.doi.org/10.1007/s002220100149 |journal=Inventiones Mathematicae |volume=146 |issue=1 |pages=1–33 |doi=10.1007/s002220100149 |bibcode=2001InMat.146....1H |s2cid=12603806 |issn=0020-9910}}</ref>

===In a Banach space===
Another generalization is Newton's method to find a root of a [[Functional (mathematics)|functional]] {{mvar|F}} defined in a [[Banach space]]. In this case the formulation is

<math display="block">X_{n+1}=X_n-\bigl(F'(X_n)\bigr)^{-1}F(X_n),\,</math>

where {{math|{{var|{{prime|F}}}}({{var|X}}{{sub|{{var|n}}}})}} is the [[Fréchet derivative]] computed at {{math|{{var|X}}{{sub|{{var|n}}}}}}. One needs the Fréchet derivative to be boundedly invertible at each {{math|{{var|X}}{{sub|{{var|n}}}}}} in order for the method to be applicable. A condition for existence of and convergence to a root is given by the [[Kantorovich theorem|Newton–Kantorovich theorem]].<ref>{{cite book |first=Tetsuro |last=Yamamoto |chapter=Historical Developments in Convergence Analysis for Newton's and Newton-like Methods |pages=241–263 |editor-first=C. |editor-last=Brezinski |editor2-first=L. |editor2-last=Wuytack |title=Numerical Analysis: Historical Developments in the 20th Century |publisher=North-Holland |year=2001 |isbn=0-444-50617-9 }}</ref>

====Nash–Moser iteration====
{{details|Nash–Moser theorem}}
In the 1950s, [[John Forbes Nash Jr.|John Nash]] developed a version of the Newton's method to apply to the problem of constructing [[isometric embedding]]s of general [[Riemannian manifold]]s in [[Euclidean space]]. The ''loss of derivatives'' problem, present in this context, made the standard Newton iteration inapplicable, since it could not be continued indefinitely (much less converge). Nash's solution involved the introduction of [[smoothing]] operators into the iteration. He was able to prove the convergence of his smoothed Newton method, for the purpose of proving an [[implicit function theorem]] for isometric embeddings. In the 1960s, [[Jürgen Moser]] showed that Nash's methods were flexible enough to apply to problems beyond isometric embedding, particularly in [[celestial mechanics]]. Since then, a number of mathematicians, including [[Mikhael Gromov (mathematician)|Mikhael Gromov]] and [[Richard S. Hamilton|Richard Hamilton]], have found generalized abstract versions of the Nash–Moser theory.<ref>{{cite journal|first=Richard S.|last=Hamilton|mr=0656198|title=The inverse function theorem of Nash and Moser|journal=[[Bulletin of the American Mathematical Society]] |series=New Series |volume=7|year=1982|issue=1|pages=65–222|doi-access=free|doi=10.1090/s0273-0979-1982-15004-2|zbl=0499.58003|author-link1=Richard S. Hamilton}}</ref><ref>{{cite book|last1=Gromov|first1=Mikhael|title=Partial differential relations|series=Ergebnisse der Mathematik und ihrer Grenzgebiete (3)|volume=9|publisher=[[Springer-Verlag]]|location=Berlin|year=1986|isbn=3-540-12177-3|mr=0864505|author-link1=Mikhael Gromov (mathematician)|doi=10.1007/978-3-662-02267-2}}</ref> In Hamilton's formulation, the Nash–Moser theorem forms a generalization of the Banach space Newton method which takes place in certain [[Fréchet space]]s.