Editing Inverse function theorem (section)

==Methods of proof==
As an important result, the inverse function theorem has been given numerous proofs. The proof most commonly seen in textbooks relies on the [[contraction mapping]] principle, also known as the [[Banach fixed-point theorem]] (which can also be used as the key step in the proof of [[Picard–Lindelöf theorem|existence and uniqueness]] of solutions to [[ordinary differential equations]]).<ref>{{cite book |first=Robert C. |last=McOwen |title=Partial Differential Equations: Methods and Applications |location=Upper Saddle River, NJ |publisher=Prentice Hall |year=1996 |isbn=0-13-121880-8 |pages=218–224 |chapter=Calculus of Maps between Banach Spaces |chapter-url=https://books.google.com/books?id=TuNHsNC1Yf0C&pg=PA218 }}</ref><ref>{{Cite web |url=https://terrytao.wordpress.com/2011/09/12/the-inverse-function-theorem-for-everywhere-differentiable-maps/ |first=Terence |last=Tao |author-link=Terence Tao |title=The inverse function theorem for everywhere differentiable maps |date=September 12, 2011 |access-date=2019-07-26 }}</ref>

Since the fixed point theorem applies in infinite-dimensional (Banach space) settings, this proof generalizes immediately to the infinite-dimensional version of the inverse function theorem<ref>{{Cite web|url=https://r-grande.github.io/Expository/Inverse%20Function%20Theorem.pdf |title=Inverse Function Theorem|last=Jaffe|first=Ethan}}</ref> (see [[Inverse function theorem#Generalizations|Generalizations]] below).

An alternate proof in finite dimensions hinges on the [[extreme value theorem]] for functions on a [[compact set]].<ref name="spivak_manifolds">{{harvnb|Spivak|1965|loc=pages 31–35 }}</ref> This approach has an advantage that the proof generalizes to a situation where there is no Cauchy completeness (see {{section link||Over_a_real_closed_field}}).

Yet another proof uses [[Newton's method]], which has the advantage of providing an [[effective method|effective version]] of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible.<ref name="hubbard_hubbard">{{cite book |first1=John H. |last1=Hubbard |author-link=John H. Hubbard |first2=Barbara Burke |last2=Hubbard|author2-link=Barbara Burke Hubbard |title=Vector Analysis, Linear Algebra, and Differential Forms: A Unified Approach |edition=Matrix |year=2001 }}</ref>

=== Proof for single-variable functions ===
We want to prove the following: ''Let <math>D \subseteq \R</math> be an open set with <math>x_0 \in D, f: D \to \R</math> a continuously differentiable function defined on <math>D</math>, and suppose that <math>f'(x_0) \ne 0</math>. Then there exists an open interval <math>I</math> with <math>x_0 \in I</math> such that <math>f</math> maps <math>I</math> bijectively onto the open interval <math>J = f(I)</math>, and such that the inverse function <math>f^{-1} : J \to I</math> is continuously differentiable, and for any <math>y \in J</math>, if <math>x \in I</math> is such that <math>f(x) = y</math>, then <math>(f^{-1})'(y) = \dfrac{1}{f'(x)}</math>.''

We may without loss of generality assume that <math>f'(x_0) > 0</math>. Given that <math>D</math> is an open set and <math>f'</math> is continuous at <math>x_0</math>, there exists <math>r > 0</math> such that <math>(x_0 - r, x_0 + r) \subseteq D</math> and<math display="block">|f'(x) - f'(x_0)| < \dfrac{f'(x_0)}{2} \qquad \text{for all } |x - x_0| < r.</math>

In particular,<math display="block">f'(x) > \dfrac{f'(x_0)}{2} >0 \qquad \text{for all } |x - x_0| < r.</math>

This shows that <math>f</math> is strictly increasing for all <math>|x - x_0| < r</math>. Let <math>\delta > 0</math> be such that <math>\delta < r</math>. Then <math>[x - \delta, x + \delta] \subseteq (x_0 - r, x_0 + r)</math>. By the intermediate value theorem, we find that <math>f</math> maps the interval <math>[x - \delta, x + \delta]</math> bijectively onto <math>[f(x - \delta), f(x + \delta)]</math>. Denote by <math>I = (x-\delta, x+\delta)</math> and <math>J = (f(x - \delta),f(x + \delta))</math>. Then <math>f: I \to J</math> is a bijection and the inverse <math>f^{-1}: J \to I</math> exists. The fact that <math>f^{-1}: J \to I</math> is differentiable follows from the differentiability of <math>f</math>. In particular, the result follows from the fact that if <math>f: I \to \R</math> is a strictly monotonic and continuous function that is differentiable at <math>x_0 \in I</math> with <math>f'(x_0) \ne 0</math>, then <math>f^{-1}: f(I) \to \R</math> is differentiable with <math>(f^{-1})'(y_0) = \dfrac{1}{f'(y_0)}</math>, where <math>y_0 = f(x_0)</math> (a standard result in analysis). This completes the proof.

=== A proof using successive approximation ===

To prove existence, it can be assumed after an affine transformation that <math>f(0)=0</math> and <math>f^\prime(0)=I</math>, so that <math> a=b=0</math>.

By the [[Mean value theorem#Mean value theorem for vector-valued functions|mean value theorem for vector-valued functions]], for a differentiable function <math>u:[0,1]\to\mathbb R^m</math>, <math display="inline">\|u(1)-u(0)\|\le \sup_{0\le t\le 1} \|u^\prime(t)\|</math>. Setting <math>u(t)=f(x+t(x^\prime -x)) - x-t(x^\prime-x)</math>, it follows that

:<math>\|f(x) - f(x^\prime) - x + x^\prime\| \le \|x -x^\prime\|\,\sup_{0\le t \le 1} \|f^\prime(x+t(x^\prime -x))-I\|.</math>

Now choose <math>\delta>0</math> so that <math display="inline">\|f'(x) - I\| < {1\over 2}</math> for <math>\|x\|< \delta</math>. Suppose that <math>\|y\|<\delta/2</math> and define <math>x_n</math> inductively by <math>x_0=0</math> and <math> x_{n+1}=x_n + y - f(x_n)</math>. The assumptions show that if <math> \|x\|, \,\, \|x^\prime\| < \delta</math> then

:<math>\|f(x)-f(x^\prime) - x + x^\prime\| \le \|x-x^\prime\|/2</math>.

In particular <math>f(x)=f(x^\prime)</math> implies <math>x=x^\prime</math>. In the inductive scheme <math>\|x_n\| <\delta</math>
and <math>\|x_{n+1} - x_n\| < \delta/2^n</math>. Thus <math>(x_n)</math> is a [[Cauchy sequence]] tending to <math>x</math>. By construction <math>f(x)=y</math> as required.

To check that <math>g=f^{-1}</math> is C<sup>1</sup>, write <math>g(y+k) = x+h</math> so that
<math>f(x+h)=f(x)+k</math>. By the inequalities above, <math>\|h-k\| <\|h\|/2</math> so that <math>\|h\|/2<\|k\| < 2\|h\|</math>.
On the other hand, if <math>A=f^\prime(x)</math>, then <math>\|A-I\|<1/2</math>. Using the [[geometric series]] for <math>B=I-A</math>, it follows that <math>\|A^{-1}\| < 2</math>. But then

:<math> {\|g(y+k) -g(y) - f^\prime(g(y))^{-1}k \| \over \|k\|} 
= {\|h -f^\prime(x)^{-1}[f(x+h)-f(x)]\| \over \|k\|} 
\le 4 {\|f(x+h) - f(x) -f^\prime(x)h\|\over \|h\|} </math>

tends to 0 as <math>k</math> and <math>h</math> tend to 0, proving that <math>g</math> is C<sup>1</sup> with <math>g^\prime(y)=f^\prime(g(y))^{-1}</math>.

The proof above is presented for a finite-dimensional space, but applies equally well for [[Banach space]]s.  If an invertible function <math>f</math> is C<sup>k</sup> with <math>k>1</math>, then so too is its inverse. This follows by induction using the fact that the map <math>F(A)=A^{-1}</math> on operators is C<sup>k</sup> for any <math>k</math> (in the finite-dimensional case this is an elementary fact because the inverse of a matrix is given as the [[adjugate matrix]] divided by its [[determinant]]).
<ref name="Hörmander" /><ref>{{cite book|title=Calcul Differentiel|language=fr|first=Henri|last= Cartan|author-link= Henri Cartan|publisher=[[Éditions Hermann|Hermann]]|year= 1971|isbn=978-0-395-12033-0 |pages=55–61}}</ref> The method of proof here can be found in the books of [[Henri Cartan]], [[Jean Dieudonné]], [[Serge Lang]], [[Roger Godement]] and [[Lars Hörmander]].

=== A proof using the contraction mapping principle ===
Here is a proof based on the [[contraction mapping theorem]]. Specifically, following T. Tao,<ref>Theorem 17.7.2 in {{cite book|mr=3310023|last1=Tao|first1=Terence|title=Analysis. II|edition=Third edition of 2006 original|series=Texts and Readings in Mathematics|volume=38|publisher=Hindustan Book Agency|location=New Delhi|year=2014|isbn=978-93-80250-65-6|zbl=1300.26003}}</ref> it uses the following consequence of the contraction mapping theorem.

{{math_theorem|name=Lemma|math_statement=Let <math>B(0, r)</math> denote an open ball of radius ''r'' in <math>\mathbb{R}^n</math> with center 0 and <math>g : B(0, r) \to \mathbb{R}^n</math> a map with a constant <math>0 < c < 1</math> such that 
:<math>|g(y) - g(x)| \le c|y-x|</math>
for all <math>x, y</math> in <math>B(0, r)</math>. Then for <math>f = I + g</math> on <math>B(0, r)</math>, we have
:<math>(1-c)|x - y| \le |f(x) - f(y)|,</math>
in particular, ''f'' is injective. If, moreover, <math>g(0) = 0</math>, then
:<math>B(0, (1-c)r) \subset f(B(0, r)) \subset B(0, (1+c)r)</math>.

More generally, the statement remains true if <math>\mathbb{R}^n</math> is replaced by a Banach space. Also, the first part of the lemma is true for any normed space.}}

Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when <math>a = 0, b = f(a) = 0</math> and <math>f'(0) = I</math>. Let <math>g = f - I</math>. The [[mean value inequality]] applied to <math>t \mapsto g(x + t(y - x))</math> says:
:<math>|g(y) - g(x)| \le |y-x|\sup_{0 < t < 1} |g'(x + t(y - x))|.</math>
Since <math>g'(0) = I - I = 0</math> and <math>g'</math> is continuous, we can find an <math>r > 0</math> such that
:<math>|g(y) - g(x)| \le 2^{-1}|y-x|</math>
for all <math>x, y</math> in <math>B(0, r)</math>. Then the early lemma says that <math>f = g + I</math> is injective on <math>B(0, r)</math> and <math>B(0, r/2) \subset f(B(0, r))</math>. Then
:<math>f : U = B(0, r) \cap f^{-1}(B(0, r/2)) \to V = B(0, r/2)</math>
is bijective and thus has an inverse. Next, we show the inverse <math>f^{-1}</math> is continuously differentiable (this part of the argument is the same as that in the previous proof). This time, let <math>g = f^{-1}</math> denote the inverse of <math>f</math> and <math>A = f'(x)</math>. For <math>x = g(y)</math>, we write <math>g(y + k) = x + h</math> or <math>y + k = f(x+h)</math>. Now, by the early estimate, we have
:<math>|h - k| = |f(x+h) - f(x) - h| \le |h|/2</math>
and so <math>|h|/2 \le |k|</math>. Writing <math>\| \cdot \|</math> for the operator norm,
:<math>|g(y+k) - g(y) - A^{-1} k| = |h - A^{-1}(f(x + h) - f(x))| \le \|A^{-1}\||Ah - f(x+h) + f(x)|.</math>
As <math>k \to 0</math>, we have <math>h \to 0</math> and <math>|h|/|k|</math> is bounded. Hence, <math>g</math> is differentiable at <math>y</math> with the derivative <math>g'(y) = f'(g(y))^{-1}</math>. Also, <math>g'</math> is the same as the composition <math>\iota \circ f' \circ g</math> where <math>\iota : T \mapsto T^{-1}</math>; so <math>g'</math> is continuous.

It remains to show the lemma. First, we have:
:<math>|x - y| - |f(x) - f(y)| \le |g(x) - g(y)| \le c|x - y|,</math>
which is to say
:<math>(1 - c)|x - y| \le |f(x) - f(y)|.</math>
This proves the first part. Next, we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map
:<math>F : \overline{B}(0, r') \to \overline{B}(0, r'), \, x \mapsto y - g(x)</math>
where <math>0 < r' < r</math> such that <math>|y| \le (1-c)r'</math> and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that <math>F</math> is a well-defined strict-contraction mapping is straightforward. Finally, we have: <math>f(B(0, r)) \subset B(0, (1+c)r)</math> since
:<math>|f(x)| = |x + g(x) - g(0)| \le (1+c)|x|. \square</math>

As might be clear, this proof is not substantially different from the previous one, as the proof of the contraction mapping theorem is by successive approximation.