Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Inverse function theorem
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Theorem in mathematics}} {{Use dmy dates|date=December 2023}} {{Calculus}} In [[mathematics]], the '''inverse function theorem''' is a [[theorem]] that asserts that, if a [[real function]] ''f'' has a [[continuously differentiable function|continuous derivative]] near a point where its derivative is nonzero, then, near this point, ''f'' has an [[inverse function]]. The inverse function is also [[differentiable function|differentiable]], and the ''[[inverse function rule]]'' expresses its derivative as the [[multiplicative inverse]] of the derivative of ''f''. The theorem applies verbatim to [[complex-valued function]]s of a [[complex number|complex variable]]. It generalizes to functions from ''n''-[[tuples]] (of real or complex numbers) to ''n''-tuples, and to functions between [[vector space]]s of the same finite dimension, by replacing "derivative" with "[[Jacobian matrix]]" and "nonzero derivative" with "nonzero [[Jacobian determinant]]". If the function of the theorem belongs to a higher [[differentiability class]], the same is true for the inverse function. There are also versions of the inverse function theorem for [[holomorphic function]]s, for differentiable maps between [[manifold]]s, for differentiable functions between [[Banach space]]s, and so forth. The theorem was first established by [[Émile Picard|Picard]] and [[Édouard Goursat|Goursat]] using an iterative scheme: the basic idea is to prove a [[fixed point theorem]] using the [[contraction mapping theorem]]. ==Statements== For functions of a single [[Variable (mathematics)|variable]], the theorem states that if <math>f</math> is a [[continuously differentiable]] function with nonzero derivative at the point <math>a</math>; then <math>f</math> is injective (or bijective onto the image) in a neighborhood of <math>a</math>, the inverse is continuously differentiable near <math>b=f(a)</math>, and the derivative of the inverse function at <math>b</math> is the reciprocal of the derivative of <math>f</math> at <math>a</math>: <math display=block>\bigl(f^{-1}\bigr)'(b) = \frac{1}{f'(a)} = \frac{1}{f'(f^{-1}(b))}.</math><!-- Not sure the meaning of the following alternative version; if the function is already injective, the theorem gives nothing: An alternate version, which assumes that <math>f</math> is [[Continuous function|continuous]] and [[Locally injective function|injective near {{Mvar|a}}]], and differentiable at {{Mvar|a}} with a non-zero derivative, will also result in <math>f</math> being invertible near {{Mvar|a}}, with an inverse that's similarly continuous and [[Injective function|injective]], and where the above formula would apply as well. --> It can happen that a function <math>f</math> may be injective near a point <math>a</math> while <math>f'(a) = 0</math>. An example is <math>f(x) = (x - a)^3</math>. In fact, for such a function, the inverse cannot be differentiable at <math>b = f(a)</math>, since if <math>f^{-1}</math> were differentiable at <math>b</math>, then, by the chain rule, <math>1 = (f^{-1} \circ f)'(a) = (f^{-1})'(b)f'(a)</math>, which implies <math>f'(a) \ne 0</math>. (The situation is different for holomorphic functions; see [[#Holomorphic inverse function theorem]] below.) For functions of more than one variable, the theorem states that if <math>f</math> is a continuously differentiable function from an open subset <math>A</math> of <math>\mathbb{R}^n</math> into <math>\R^n</math>, and the [[total derivative|derivative]] <math>f'(a)</math> is invertible at a point {{Mvar|a}} (that is, the determinant of the [[Jacobian matrix and determinant|Jacobian matrix]] of {{Mvar|f}} at {{Mvar|a}} is non-zero), then there exist neighborhoods <math>U</math> of <math>a</math> in <math>A</math> and <math>V</math> of <math>b = f(a)</math> such that <math>f(U) \subset V</math> and <math>f : U \to V</math> is bijective.<ref name="Hörmander">Theorem 1.1.7. in {{cite book|title=The Analysis of Linear Partial Differential Operators I: Distribution Theory and Fourier Analysis|series=Classics in Mathematics|first=Lars|last= Hörmander|author-link=Lars Hörmander|publisher=Springer|year= 2015|edition=2nd| isbn= 978-3-642-61497-2}}</ref> Writing <math>f=(f_1,\ldots,f_n)</math>, this means that the system of {{Mvar|n}} equations <math>y_i = f_i(x_1, \dots, x_n)</math> has a unique solution for <math>x_1, \dots, x_n</math> in terms of <math>y_1, \dots, y_n</math> when <math>x \in U, y \in V</math>. Note that the theorem ''does not'' say <math>f</math> is bijective onto the image where <math>f'</math> is invertible but that it is locally bijective where <math>f'</math> is invertible. Moreover, the theorem says that the inverse function <math>f^{-1} : V \to U</math> is continuously differentiable, and its derivative at <math>b=f(a)</math> is the inverse map of <math>f'(a)</math>; i.e., :<math>(f^{-1})'(b) = f'(a)^{-1}.</math> In other words, if <math>Jf^{-1}(b), Jf(a)</math> are the Jacobian matrices representing <math>(f^{-1})'(b), f'(a)</math>, this means: :<math>Jf^{-1}(b) = Jf(a)^{-1}.</math> The hard part of the theorem is the existence and differentiability of <math>f^{-1}</math>. Assuming this, the inverse derivative formula follows from the [[chain rule]] applied to <math>f^{-1}\circ f = I</math>. (Indeed, <math>1=I'(a) = (f^{-1} \circ f)'(a) = (f^{-1})'(b) \circ f'(a).</math>) Since taking the inverse is infinitely differentiable, the formula for the derivative of the inverse shows that if <math>f</math> is continuously <math>k</math> times differentiable, with invertible derivative at the point {{Mvar|a}}, then the inverse is also continuously <math>k</math> times differentiable. Here <math>k</math> is a positive integer or <math>\infty</math>. There are two variants of the inverse function theorem.<ref name="Hörmander" /> Given a continuously differentiable map <math>f : U \to \mathbb{R}^m</math>, the first is *The derivative <math>f'(a)</math> is surjective (i.e., the Jacobian matrix representing it has rank <math>m</math>) if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>f \circ g = I</math> near <math>b</math>, and the second is *The derivative <math>f'(a)</math> is injective if and only if there exists a continuously differentiable function <math>g</math> on a neighborhood <math>V</math> of <math>b = f(a)</math> such that <math>g \circ f = I</math> near <math>a</math>. In the first case (when <math>f'(a)</math> is surjective), the point <math>b = f(a)</math> is called a [[regular value]]. Since <math>m = \dim \ker(f'(a)) + \dim \operatorname{im}(f'(a))</math>, the first case is equivalent to saying <math>b = f(a)</math> is not in the image of [[Critical point (mathematics)#Critical point of a differentiable map|critical points]] <math>a</math> (a critical point is a point <math>a</math> such that the kernel of <math>f'(a)</math> is nonzero). The statement in the first case is a special case of the [[submersion theorem]]. These variants are restatements of the inverse functions theorem. Indeed, in the first case when <math>f'(a)</math> is surjective, we can find an (injective) linear map <math>T</math> such that <math>f'(a) \circ T = I</math>. Define <math>h(x) = a + Tx</math> so that we have: :<math>(f \circ h)'(0) = f'(a) \circ T = I.</math> Thus, by the inverse function theorem, <math>f \circ h</math> has inverse near <math>0</math>; i.e., <math>f \circ h \circ (f \circ h)^{-1} = I</math> near <math>b</math>. The second case (<math>f'(a)</math> is injective) is seen in the similar way. ==Example== Consider the [[vector-valued function]] <math>F:\mathbb{R}^2\to\mathbb{R}^2\!</math> defined by: :<math> F(x,y)= \begin{bmatrix} {e^x \cos y}\\ {e^x \sin y}\\ \end{bmatrix}. </math> The Jacobian matrix of it at <math>(x, y)</math> is: :<math> JF(x,y)= \begin{bmatrix} {e^x \cos y} & {-e^x \sin y}\\ {e^x \sin y} & {e^x \cos y}\\ \end{bmatrix} </math> with the determinant: :<math> \det JF(x,y)= e^{2x} \cos^2 y + e^{2x} \sin^2 y= e^{2x}. \,\!</math> The determinant <math>e^{2x}\!</math> is nonzero everywhere. Thus the theorem guarantees that, for every point {{Mvar|p}} in <math>\mathbb{R}^2\!</math>, there exists a neighborhood about {{Mvar|p}} over which {{Mvar|F}} is invertible. This does not mean {{Mvar|F}} is invertible over its entire domain: in this case {{Mvar|F}} is not even [[injective]] since it is periodic: <math>F(x,y)=F(x,y+2\pi)\!</math>. == Counter-example == [[File:Inv-Fun-Thm-3.png|thumb|The function <math>f(x)=x+2 x^2\sin(\tfrac1x)</math> is bounded inside a quadratic envelope near the line <math>y=x</math>, so <math>f'(0)=1</math>. Nevertheless, it has local max/min points accumulating at <math>x=0</math>, so it is not one-to-one on any surrounding interval.]] If one drops the assumption that the derivative is continuous, the function no longer need be invertible. For example <math>f(x) = x + 2x^2\sin(\tfrac1x)</math> and <math>f(0)= 0</math> has discontinuous derivative <math>f'\!(x) = 1 -2\cos(\tfrac1x) + 4x\sin(\tfrac1x)</math> and <math>f'\!(0) = 1</math>, which vanishes arbitrarily close to <math>x=0</math>. These critical points are local max/min points of <math>f</math>, so <math>f</math> is not one-to-one (and not invertible) on any interval containing <math>x=0</math>. Intuitively, the slope <math>f'\!(0)=1</math> does not propagate to nearby points, where the slopes are governed by a weak but rapid oscillation. ==Methods of proof== As an important result, the inverse function theorem has been given numerous proofs. The proof most commonly seen in textbooks relies on the [[contraction mapping]] principle, also known as the [[Banach fixed-point theorem]] (which can also be used as the key step in the proof of [[Picard–Lindelöf theorem|existence and uniqueness]] of solutions to [[ordinary differential equations]]).<ref>{{cite book |first=Robert C. |last=McOwen |title=Partial Differential Equations: Methods and Applications |location=Upper Saddle River, NJ |publisher=Prentice Hall |year=1996 |isbn=0-13-121880-8 |pages=218–224 |chapter=Calculus of Maps between Banach Spaces |chapter-url=https://books.google.com/books?id=TuNHsNC1Yf0C&pg=PA218 }}</ref><ref>{{Cite web |url=https://terrytao.wordpress.com/2011/09/12/the-inverse-function-theorem-for-everywhere-differentiable-maps/ |first=Terence |last=Tao |author-link=Terence Tao |title=The inverse function theorem for everywhere differentiable maps |date=September 12, 2011 |access-date=2019-07-26 }}</ref> Since the fixed point theorem applies in infinite-dimensional (Banach space) settings, this proof generalizes immediately to the infinite-dimensional version of the inverse function theorem<ref>{{Cite web|url=https://r-grande.github.io/Expository/Inverse%20Function%20Theorem.pdf |title=Inverse Function Theorem|last=Jaffe|first=Ethan}}</ref> (see [[Inverse function theorem#Generalizations|Generalizations]] below). An alternate proof in finite dimensions hinges on the [[extreme value theorem]] for functions on a [[compact set]].<ref name="spivak_manifolds">{{harvnb|Spivak|1965|loc=pages 31–35 }}</ref> This approach has an advantage that the proof generalizes to a situation where there is no Cauchy completeness (see {{section link||Over_a_real_closed_field}}). Yet another proof uses [[Newton's method]], which has the advantage of providing an [[effective method|effective version]] of the theorem: bounds on the derivative of the function imply an estimate of the size of the neighborhood on which the function is invertible.<ref name="hubbard_hubbard">{{cite book |first1=John H. |last1=Hubbard |author-link=John H. Hubbard |first2=Barbara Burke |last2=Hubbard|author2-link=Barbara Burke Hubbard |title=Vector Analysis, Linear Algebra, and Differential Forms: A Unified Approach |edition=Matrix |year=2001 }}</ref> === Proof for single-variable functions === We want to prove the following: ''Let <math>D \subseteq \R</math> be an open set with <math>x_0 \in D, f: D \to \R</math> a continuously differentiable function defined on <math>D</math>, and suppose that <math>f'(x_0) \ne 0</math>. Then there exists an open interval <math>I</math> with <math>x_0 \in I</math> such that <math>f</math> maps <math>I</math> bijectively onto the open interval <math>J = f(I)</math>, and such that the inverse function <math>f^{-1} : J \to I</math> is continuously differentiable, and for any <math>y \in J</math>, if <math>x \in I</math> is such that <math>f(x) = y</math>, then <math>(f^{-1})'(y) = \dfrac{1}{f'(x)}</math>.'' We may without loss of generality assume that <math>f'(x_0) > 0</math>. Given that <math>D</math> is an open set and <math>f'</math> is continuous at <math>x_0</math>, there exists <math>r > 0</math> such that <math>(x_0 - r, x_0 + r) \subseteq D</math> and<math display="block">|f'(x) - f'(x_0)| < \dfrac{f'(x_0)}{2} \qquad \text{for all } |x - x_0| < r.</math> In particular,<math display="block">f'(x) > \dfrac{f'(x_0)}{2} >0 \qquad \text{for all } |x - x_0| < r.</math> This shows that <math>f</math> is strictly increasing for all <math>|x - x_0| < r</math>. Let <math>\delta > 0</math> be such that <math>\delta < r</math>. Then <math>[x - \delta, x + \delta] \subseteq (x_0 - r, x_0 + r)</math>. By the intermediate value theorem, we find that <math>f</math> maps the interval <math>[x - \delta, x + \delta]</math> bijectively onto <math>[f(x - \delta), f(x + \delta)]</math>. Denote by <math>I = (x-\delta, x+\delta)</math> and <math>J = (f(x - \delta),f(x + \delta))</math>. Then <math>f: I \to J</math> is a bijection and the inverse <math>f^{-1}: J \to I</math> exists. The fact that <math>f^{-1}: J \to I</math> is differentiable follows from the differentiability of <math>f</math>. In particular, the result follows from the fact that if <math>f: I \to \R</math> is a strictly monotonic and continuous function that is differentiable at <math>x_0 \in I</math> with <math>f'(x_0) \ne 0</math>, then <math>f^{-1}: f(I) \to \R</math> is differentiable with <math>(f^{-1})'(y_0) = \dfrac{1}{f'(y_0)}</math>, where <math>y_0 = f(x_0)</math> (a standard result in analysis). This completes the proof. === A proof using successive approximation === To prove existence, it can be assumed after an affine transformation that <math>f(0)=0</math> and <math>f^\prime(0)=I</math>, so that <math> a=b=0</math>. By the [[Mean value theorem#Mean value theorem for vector-valued functions|mean value theorem for vector-valued functions]], for a differentiable function <math>u:[0,1]\to\mathbb R^m</math>, <math display="inline">\|u(1)-u(0)\|\le \sup_{0\le t\le 1} \|u^\prime(t)\|</math>. Setting <math>u(t)=f(x+t(x^\prime -x)) - x-t(x^\prime-x)</math>, it follows that :<math>\|f(x) - f(x^\prime) - x + x^\prime\| \le \|x -x^\prime\|\,\sup_{0\le t \le 1} \|f^\prime(x+t(x^\prime -x))-I\|.</math> Now choose <math>\delta>0</math> so that <math display="inline">\|f'(x) - I\| < {1\over 2}</math> for <math>\|x\|< \delta</math>. Suppose that <math>\|y\|<\delta/2</math> and define <math>x_n</math> inductively by <math>x_0=0</math> and <math> x_{n+1}=x_n + y - f(x_n)</math>. The assumptions show that if <math> \|x\|, \,\, \|x^\prime\| < \delta</math> then :<math>\|f(x)-f(x^\prime) - x + x^\prime\| \le \|x-x^\prime\|/2</math>. In particular <math>f(x)=f(x^\prime)</math> implies <math>x=x^\prime</math>. In the inductive scheme <math>\|x_n\| <\delta</math> and <math>\|x_{n+1} - x_n\| < \delta/2^n</math>. Thus <math>(x_n)</math> is a [[Cauchy sequence]] tending to <math>x</math>. By construction <math>f(x)=y</math> as required. To check that <math>g=f^{-1}</math> is C<sup>1</sup>, write <math>g(y+k) = x+h</math> so that <math>f(x+h)=f(x)+k</math>. By the inequalities above, <math>\|h-k\| <\|h\|/2</math> so that <math>\|h\|/2<\|k\| < 2\|h\|</math>. On the other hand, if <math>A=f^\prime(x)</math>, then <math>\|A-I\|<1/2</math>. Using the [[geometric series]] for <math>B=I-A</math>, it follows that <math>\|A^{-1}\| < 2</math>. But then :<math> {\|g(y+k) -g(y) - f^\prime(g(y))^{-1}k \| \over \|k\|} = {\|h -f^\prime(x)^{-1}[f(x+h)-f(x)]\| \over \|k\|} \le 4 {\|f(x+h) - f(x) -f^\prime(x)h\|\over \|h\|} </math> tends to 0 as <math>k</math> and <math>h</math> tend to 0, proving that <math>g</math> is C<sup>1</sup> with <math>g^\prime(y)=f^\prime(g(y))^{-1}</math>. The proof above is presented for a finite-dimensional space, but applies equally well for [[Banach space]]s. If an invertible function <math>f</math> is C<sup>k</sup> with <math>k>1</math>, then so too is its inverse. This follows by induction using the fact that the map <math>F(A)=A^{-1}</math> on operators is C<sup>k</sup> for any <math>k</math> (in the finite-dimensional case this is an elementary fact because the inverse of a matrix is given as the [[adjugate matrix]] divided by its [[determinant]]). <ref name="Hörmander" /><ref>{{cite book|title=Calcul Differentiel|language=fr|first=Henri|last= Cartan|author-link= Henri Cartan|publisher=[[Éditions Hermann|Hermann]]|year= 1971|isbn=978-0-395-12033-0 |pages=55–61}}</ref> The method of proof here can be found in the books of [[Henri Cartan]], [[Jean Dieudonné]], [[Serge Lang]], [[Roger Godement]] and [[Lars Hörmander]]. === A proof using the contraction mapping principle === Here is a proof based on the [[contraction mapping theorem]]. Specifically, following T. Tao,<ref>Theorem 17.7.2 in {{cite book|mr=3310023|last1=Tao|first1=Terence|title=Analysis. II|edition=Third edition of 2006 original|series=Texts and Readings in Mathematics|volume=38|publisher=Hindustan Book Agency|location=New Delhi|year=2014|isbn=978-93-80250-65-6|zbl=1300.26003}}</ref> it uses the following consequence of the contraction mapping theorem. {{math_theorem|name=Lemma|math_statement=Let <math>B(0, r)</math> denote an open ball of radius ''r'' in <math>\mathbb{R}^n</math> with center 0 and <math>g : B(0, r) \to \mathbb{R}^n</math> a map with a constant <math>0 < c < 1</math> such that :<math>|g(y) - g(x)| \le c|y-x|</math> for all <math>x, y</math> in <math>B(0, r)</math>. Then for <math>f = I + g</math> on <math>B(0, r)</math>, we have :<math>(1-c)|x - y| \le |f(x) - f(y)|,</math> in particular, ''f'' is injective. If, moreover, <math>g(0) = 0</math>, then :<math>B(0, (1-c)r) \subset f(B(0, r)) \subset B(0, (1+c)r)</math>. More generally, the statement remains true if <math>\mathbb{R}^n</math> is replaced by a Banach space. Also, the first part of the lemma is true for any normed space.}} Basically, the lemma says that a small perturbation of the identity map by a contraction map is injective and preserves a ball in some sense. Assuming the lemma for a moment, we prove the theorem first. As in the above proof, it is enough to prove the special case when <math>a = 0, b = f(a) = 0</math> and <math>f'(0) = I</math>. Let <math>g = f - I</math>. The [[mean value inequality]] applied to <math>t \mapsto g(x + t(y - x))</math> says: :<math>|g(y) - g(x)| \le |y-x|\sup_{0 < t < 1} |g'(x + t(y - x))|.</math> Since <math>g'(0) = I - I = 0</math> and <math>g'</math> is continuous, we can find an <math>r > 0</math> such that :<math>|g(y) - g(x)| \le 2^{-1}|y-x|</math> for all <math>x, y</math> in <math>B(0, r)</math>. Then the early lemma says that <math>f = g + I</math> is injective on <math>B(0, r)</math> and <math>B(0, r/2) \subset f(B(0, r))</math>. Then :<math>f : U = B(0, r) \cap f^{-1}(B(0, r/2)) \to V = B(0, r/2)</math> is bijective and thus has an inverse. Next, we show the inverse <math>f^{-1}</math> is continuously differentiable (this part of the argument is the same as that in the previous proof). This time, let <math>g = f^{-1}</math> denote the inverse of <math>f</math> and <math>A = f'(x)</math>. For <math>x = g(y)</math>, we write <math>g(y + k) = x + h</math> or <math>y + k = f(x+h)</math>. Now, by the early estimate, we have :<math>|h - k| = |f(x+h) - f(x) - h| \le |h|/2</math> and so <math>|h|/2 \le |k|</math>. Writing <math>\| \cdot \|</math> for the operator norm, :<math>|g(y+k) - g(y) - A^{-1} k| = |h - A^{-1}(f(x + h) - f(x))| \le \|A^{-1}\||Ah - f(x+h) + f(x)|.</math> As <math>k \to 0</math>, we have <math>h \to 0</math> and <math>|h|/|k|</math> is bounded. Hence, <math>g</math> is differentiable at <math>y</math> with the derivative <math>g'(y) = f'(g(y))^{-1}</math>. Also, <math>g'</math> is the same as the composition <math>\iota \circ f' \circ g</math> where <math>\iota : T \mapsto T^{-1}</math>; so <math>g'</math> is continuous. It remains to show the lemma. First, we have: :<math>|x - y| - |f(x) - f(y)| \le |g(x) - g(y)| \le c|x - y|,</math> which is to say :<math>(1 - c)|x - y| \le |f(x) - f(y)|.</math> This proves the first part. Next, we show <math>f(B(0, r)) \supset B(0, (1-c)r)</math>. The idea is to note that this is equivalent to, given a point <math>y</math> in <math>B(0, (1-c) r)</math>, find a fixed point of the map :<math>F : \overline{B}(0, r') \to \overline{B}(0, r'), \, x \mapsto y - g(x)</math> where <math>0 < r' < r</math> such that <math>|y| \le (1-c)r'</math> and the bar means a closed ball. To find a fixed point, we use the contraction mapping theorem and checking that <math>F</math> is a well-defined strict-contraction mapping is straightforward. Finally, we have: <math>f(B(0, r)) \subset B(0, (1+c)r)</math> since :<math>|f(x)| = |x + g(x) - g(0)| \le (1+c)|x|. \square</math> As might be clear, this proof is not substantially different from the previous one, as the proof of the contraction mapping theorem is by successive approximation. == Applications == ===Implicit function theorem=== The inverse function theorem can be used to solve a system of equations :<math>\begin{align} &f_1(x) = y_1 \\ &\quad \vdots\\ &f_n(x) = y_n,\end{align}</math> i.e., expressing <math>y_1, \dots, y_n</math> as functions of <math>x = (x_1, \dots, x_n)</math>, provided the Jacobian matrix is invertible. The [[implicit function theorem]] allows to solve a more general system of equations: :<math>\begin{align} &f_1(x, y) = 0 \\ &\quad \vdots\\ &f_n(x, y) = 0\end{align}</math> for <math>y</math> in terms of <math>x</math>. Though more general, the theorem is actually a consequence of the inverse function theorem. First, the precise statement of the implicit function theorem is as follows:<ref>{{harvnb|Spivak|1965|loc=Theorem 2-12.}}</ref> *given a map <math>f : \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^m</math>, if <math>f(a, b) = 0</math>, <math>f</math> is continuously differentiable in a neighborhood of <math>(a, b)</math> and the derivative of <math>y \mapsto f(a, y)</math> at <math>b</math> is invertible, then there exists a differentiable map <math>g : U \to V</math> for some neighborhoods <math>U, V</math> of <math>a, b</math> such that <math>f(x, g(x)) = 0</math>. Moreover, if <math>f(x, y) = 0, x \in U, y \in V</math>, then <math>y = g(x)</math>; i.e., <math>g(x)</math> is a unique solution. To see this, consider the map <math>F(x, y) = (x, f(x, y))</math>. By the inverse function theorem, <math>F : U \times V \to W</math> has the inverse <math>G</math> for some neighborhoods <math>U, V, W</math>. We then have: :<math>(x, y) = F(G_1(x, y), G_2(x, y)) = (G_1(x, y), f(G_1(x, y), G_2(x, y))),</math> implying <math>x = G_1(x, y)</math> and <math>y = f(x, G_2(x, y)).</math> Thus <math>g(x) = G_2(x, 0)</math> has the required property. <math>\square</math> ===Giving a manifold structure=== In differential geometry, the inverse function theorem is used to show that the pre-image of a [[regular value]] under a smooth map is a manifold.<ref>{{harvnb|Spivak|1965|loc=Theorem 5-1. and Theorem 2-13.}}</ref> Indeed, let <math>f : U \to \mathbb{R}^r</math> be such a smooth map from an open subset of <math>\mathbb{R}^n</math> (since the result is local, there is no loss of generality with considering such a map). Fix a point <math>a</math> in <math>f^{-1}(b)</math> and then, by permuting the coordinates on <math>\mathbb{R}^n</math>, assume the matrix <math>\left [ \frac{\partial f_i}{\partial x_j}(a) \right]_{1 \le i, j \le r}</math> has rank <math>r</math>. Then the map <math>F : U \to \mathbb{R}^r \times \mathbb{R}^{n-r} = \mathbb{R}^n, \, x \mapsto (f(x), x_{r+1}, \dots, x_n)</math> is such that <math>F'(a)</math> has rank <math>n</math>. Hence, by the inverse function theorem, we find the smooth inverse <math>G</math> of <math>F</math> defined in a neighborhood <math>V \times W</math> of <math>(b, a_{r+1}, \dots, a_n)</math>. We then have :<math>x = (F \circ G)(x) = (f(G(x)), G_{r+1}(x), \dots, G_n(x)),</math> which implies :<math>(f \circ G)(x_1, \dots, x_n) = (x_1, \dots, x_r).</math> That is, after the change of coordinates by <math>G</math>, <math>f</math> is a coordinate projection (this fact is known as the [[submersion theorem]]). Moreover, since <math>G : V \times W \to U' = G(V \times W)</math> is bijective, the map :<math>g = G(b, \cdot) : W \to f^{-1}(b) \cap U', \, (x_{r+1}, \dots, x_n) \mapsto G(b, x_{r+1}, \dots, x_n)</math> is bijective with the smooth inverse. That is to say, <math>g</math> gives a local parametrization of <math>f^{-1}(b)</math> around <math>a</math>. Hence, <math>f^{-1}(b)</math> is a manifold. <math>\square</math> (Note the proof is quite similar to the proof of the implicit function theorem and, in fact, the implicit function theorem can be also used instead.) More generally, the theorem shows that if a smooth map <math>f : P \to E</math> is transversal to a submanifold <math>M \subset E</math>, then the pre-image <math>f^{-1}(M) \hookrightarrow P</math> is a submanifold.<ref>{{cite web|website=northwestern.edu|title=Transversality |url=https://sites.math.northwestern.edu/~jnkf/classes/mflds/4transversality.pdf}}</ref> == Global version == The inverse function theorem is a local result; it applies to each point. ''A priori'', the theorem thus only shows the function <math>f</math> is locally bijective (or locally diffeomorphic of some class). The next topological lemma can be used to upgrade local injectivity to injectivity that is global to some extent. {{math_theorem|name=Lemma|math_statement=<ref>One of Spivak's books (Editorial note: give the exact location).</ref>{{full citation needed|date=August 2023}}<ref>{{harvnb|Hirsch|1976|loc=Ch. 2, § 1., Exercise 7.}} NB: This one is for a <math>C^1</math>-immersion.</ref> If <math>A</math> is a closed subset of a (second-countable) topological manifold <math>X</math> (or, more generally, a topological space admitting an [[exhaustion by compact subsets]]) and <math>f : X \to Z</math>, <math>Z</math> some topological space, is a local homeomorphism that is injective on <math>A</math>, then <math>f</math> is injective on some neighborhood of <math>A</math>.}} Proof:<ref>Lemma 13.3.3. of [https://www.utsc.utoronto.ca/people/kupers/wp-content/uploads/sites/50/2020/12/difffop-2020.pdf Lectures on differential topology] utoronto.ca</ref> First assume <math>X</math> is [[compact space|compact]]. If the conclusion of the theorem is false, we can find two sequences <math>x_i \ne y_i</math> such that <math>f(x_i) = f(y_i)</math> and <math>x_i, y_i</math> each converge to some points <math>x, y</math> in <math>A</math>. Since <math>f</math> is injective on <math>A</math>, <math>x = y</math>. Now, if <math>i</math> is large enough, <math>x_i, y_i</math> are in a neighborhood of <math>x = y</math> where <math>f</math> is injective; thus, <math>x_i = y_i</math>, a contradiction. In general, consider the set <math>E = \{ (x, y) \in X^2 \mid x \ne y, f(x) = f(y) \}</math>. It is disjoint from <math>S \times S</math> for any subset <math>S \subset X</math> where <math>f</math> is injective. Let <math>X_1 \subset X_2 \subset \cdots </math> be an increasing sequence of compact subsets with union <math>X</math> and with <math>X_i</math> contained in the interior of <math>X_{i+1}</math>. Then, by the first part of the proof, for each <math>i</math>, we can find a neighborhood <math>U_i</math> of <math>A \cap X_i</math> such that <math>U_i^2 \subset X^2 - E</math>. Then <math>U = \bigcup_i U_i</math> has the required property. <math>\square</math> (See also <ref>Dan Ramras (https://mathoverflow.net/users/4042/dan-ramras), On a proof of the existence of tubular neighborhoods., URL (version: 2017-04-13): https://mathoverflow.net/q/58124</ref> for an alternative approach.) The lemma implies the following (a sort of) global version of the inverse function theorem: {{math_theorem|name=Inverse function theorem|math_statement=<ref>Ch. I., § 3, Exercise 10. and § 8, Exercise 14. in V. Guillemin, A. Pollack. "Differential Topology". Prentice-Hall Inc., 1974. ISBN 0-13-212605-2.</ref> Let <math>f : U \to V</math> be a map between open subsets of <math>\mathbb{R}^n</math> or more generally of manifolds. Assume <math>f</math> is continuously differentiable (or is <math>C^k</math>). If <math>f</math> is injective on a closed subset <math>A \subset U</math> and if the Jacobian matrix of <math>f</math> is invertible at each point of <math>A</math>, then <math>f</math> is injective on a neighborhood <math>A'</math> of <math>A</math> and <math>f^{-1} : f(A') \to A'</math> is continuously differentiable (or is <math>C^k</math>).}} Note that if <math>A</math> is a point, then the above is the usual inverse function theorem. == Holomorphic inverse function theorem == There is a version of the inverse function theorem for [[holomorphic map]]s. {{math_theorem|name=Theorem|math_statement=<ref>{{harvnb|Griffiths|Harris|1978|loc=p. 18.}}</ref><ref>{{cite book |first1=K. |last1=Fritzsche |first2=H. |last2=Grauert |title=From Holomorphic Functions to Complex Manifolds |publisher=Springer |year=2002 |pages=33–36 |isbn=978-0-387-95395-3 |url=https://books.google.com/books?id=jSeRz36zXIMC&pg=PA33 }}</ref> Let <math>U, V \subset \mathbb{C}^n</math> be open subsets such that <math>0 \in U</math> and <math>f : U \to V</math> a holomorphic map whose Jacobian matrix in variables <math>z_i, \overline{z}_i</math> is invertible (the determinant is nonzero) at <math>0</math>. Then <math>f</math> is injective in some neighborhood <math>W</math> of <math>0</math> and the inverse <math>f^{-1} : f(W) \to W</math> is holomorphic.}} The theorem follows from the usual inverse function theorem. Indeed, let <math>J_{\mathbb{R}}(f)</math> denote the Jacobian matrix of <math>f</math> in variables <math>x_i, y_i</math> and <math>J(f)</math> for that in <math>z_j, \overline{z}_j</math>. Then we have <math>\det J_{\mathbb{R}}(f) = |\det J(f)|^2</math>, which is nonzero by assumption. Hence, by the usual inverse function theorem, <math>f</math> is injective near <math>0</math> with continuously differentiable inverse. By chain rule, with <math>w = f(z)</math>, :<math>\frac{\partial}{\partial \overline{z}_j} (f_j^{-1} \circ f)(z) = \sum_k \frac{\partial f_j^{-1}}{\partial w_k}(w) \frac{\partial f_k}{\partial \overline{z}_j}(z) + \sum_k \frac{\partial f_j^{-1}}{\partial \overline{w}_k}(w) \frac{\partial \overline{f}_k}{\partial \overline{z}_j}(z)</math> where the left-hand side and the first term on the right vanish since <math>f_j^{-1} \circ f</math> and <math>f_k</math> are holomorphic. Thus, <math>\frac{\partial f_j^{-1}}{\partial \overline{w}_k}(w) = 0</math> for each <math>k</math>. <math>\square</math> Similarly, there is the implicit function theorem for holomorphic functions.<ref name="holomorphic implicit">{{harvnb|Griffiths|Harris|1978|loc=p. 19.}}</ref> As already noted earlier, it can happen that an injective smooth function has the inverse that is not smooth (e.g., <math>f(x) = x^3</math> in a real variable). This is not the case for holomorphic functions because of: {{math_theorem|name=Proposition|math_statement=<ref name="holomorphic implicit" /> If <math>f : U \to V</math> is an injective holomorphic map between open subsets of <math>\mathbb{C}^n</math>, then <math>f^{-1} : f(U) \to U</math> is holomorphic.}} == Formulations for manifolds == The inverse function theorem can be rephrased in terms of differentiable maps between [[differentiable manifold]]s. In this context the theorem states that for a differentiable map <math>F: M \to N</math> (of class <math>C^1</math>), if the [[pushforward (differential)|differential]] of <math>F</math>, :<math>dF_p: T_p M \to T_{F(p)} N</math> is a [[linear isomorphism]] at a point <math>p</math> in <math>M</math> then there exists an open neighborhood <math>U</math> of <math>p</math> such that :<math>F|_U: U \to F(U)</math> is a [[diffeomorphism]]. Note that this implies that the connected components of {{Mvar|M}} and {{Mvar|N}} containing ''p'' and ''F''(''p'') have the same dimension, as is already directly implied from the assumption that ''dF''<sub>''p''</sub> is an isomorphism. If the derivative of {{Mvar|F}} is an isomorphism at all points {{Mvar|p}} in {{Mvar|M}} then the map {{Mvar|F}} is a [[local diffeomorphism]]. ==Generalizations== ===Banach spaces=== The inverse function theorem can also be generalized to differentiable maps between [[Banach space]]s ''{{Mvar|X}}'' and ''{{Mvar|Y}}''.<ref>{{cite book |first=David G. |last=Luenberger |author-link=David Luenberger |title=Optimization by Vector Space Methods |location=New York |publisher=John Wiley & Sons |year=1969 |isbn=0-471-55359-X |pages=240–242 |url=https://books.google.com/books?id=lZU0CAH4RccC&pg=PA240 }}</ref> Let ''{{Mvar|U}}'' be an open neighbourhood of the origin in ''{{Mvar|X}}'' and <math>F: U \to Y\!</math> a continuously differentiable function, and assume that the Fréchet derivative <math>dF_0: X \to Y\!</math> of ''{{Mvar|F}}'' at 0 is a [[bounded linear map|bounded]] linear isomorphism of ''{{Mvar|X}}'' onto ''{{Mvar|Y}}''. Then there exists an open neighbourhood ''{{Mvar|V}}'' of <math>F(0)\!</math> in ''{{Mvar|Y}}'' and a continuously differentiable map <math>G: V \to X\!</math> such that <math>F(G(y)) = y</math> for all ''{{Mvar|y}}'' in ''{{Mvar|V}}''. Moreover, <math>G(y)\!</math> is the only sufficiently small solution ''{{Mvar|x}}'' of the equation <math>F(x) = y\!</math>. There is also the inverse function theorem for [[Banach manifold]]s.<ref>{{cite book |first=Serge |last=Lang |author-link=Serge Lang |title=Differential Manifolds |location=New York |publisher=Springer |year=1985 |isbn=0-387-96113-5 |pages=13–19 }}</ref> ===Constant rank theorem=== The inverse function theorem (and the [[implicit function theorem]]) can be seen as a special case of the constant rank theorem, which states that a smooth map with constant [[rank (differential topology)|rank]] near a point can be put in a particular normal form near that point.<ref name="boothby">{{cite book |first=William M. |last=Boothby |title=An Introduction to Differentiable Manifolds and Riemannian Geometry |url=https://archive.org/details/introductiontodi0000boot |url-access=registration |edition=Second |year=1986 |publisher=Academic Press |location=Orlando |isbn=0-12-116052-1 |pages=[https://archive.org/details/introductiontodi0000boot/page/46 46–50] }}</ref> Specifically, if <math>F:M\to N</math> has constant rank near a point <math>p\in M\!</math>, then there are open neighborhoods {{Mvar|U}} of {{Mvar|p}} and {{Mvar|V}} of <math>F(p)\!</math> and there are diffeomorphisms <math>u:T_pM\to U\!</math> and <math>v:T_{F(p)}N\to V\!</math> such that <math>F(U)\subseteq V\!</math> and such that the derivative <math>dF_p:T_pM\to T_{F(p)}N\!</math> is equal to <math>v^{-1}\circ F\circ u\!</math>. That is, {{Mvar|F}} "looks like" its derivative near {{Mvar|p}}. The set of points <math>p\in M</math> such that the rank is constant in a neighborhood of <math>p</math> is an open dense subset of {{Mvar|M}}; this is a consequence of [[semicontinuity]] of the rank function. Thus the constant rank theorem applies to a generic point of the domain. When the derivative of {{Mvar|F}} is injective (resp. surjective) at a point {{Mvar|p}}, it is also injective (resp. surjective) in a neighborhood of {{Mvar|p}}, and hence the rank of {{Mvar|F}} is constant on that neighborhood, and the constant rank theorem applies. ===Polynomial functions=== If it is true, the [[Jacobian conjecture]] would be a variant of the inverse function theorem for polynomials. It states that if a vector-valued polynomial function has a [[Jacobian determinant]] that is an invertible polynomial (that is a nonzero constant), then it has an inverse that is also a polynomial function. It is unknown whether this is true or false, even in the case of two variables. This is a major open problem in the theory of polynomials. ===Selections=== When <math>f: \mathbb{R}^n \to \mathbb{R}^m</math> with <math>m\leq n</math>, <math>f</math> is <math>k</math> times [[continuously differentiable]], and the Jacobian <math>A=\nabla f(\overline{x})</math> at a point <math>\overline{x}</math> is of [[rank (linear algebra)|rank]] <math>m</math>, the inverse of <math>f</math> may not be unique. However, there exists a local [[Choice function#Choice function of a multivalued map|selection function]] <math>s</math> such that <math>f(s(y)) = y</math> for all <math>y</math> in a [[neighborhood (mathematics)|neighborhood]] of <math>\overline{y} = f(\overline{x})</math>, <math>s(\overline{y}) = \overline{x}</math>, <math>s</math> is <math>k</math> times continuously differentiable in this neighborhood, and <math>\nabla s(\overline{y}) = A^T(A A^T)^{-1}</math> (<math>\nabla s(\overline{y})</math> is the [[Moore–Penrose pseudoinverse]] of <math>A</math>).<ref>{{cite book |last1=Dontchev |first1=Asen L. |last2=Rockafellar |first2=R. Tyrrell |title=Implicit Functions and Solution Mappings: A View from Variational Analysis |date=2014 |publisher=Springer-Verlag |location=New York |isbn=978-1-4939-1036-6 |page=54 |edition=Second}}</ref> === Over a real closed field === The inverse function theorem also holds over a [[real closed field]] ''k'' (or an [[O-minimal structure]]).<ref>Theorem 2.11. in {{cite book |doi=10.1017/CBO9780511525919|title=Tame Topology and O-minimal Structures. London Mathematical Society lecture note series, no. 248|year=1998 |last1=Dries |first1=L. P. D. van den |authorlink = Lou van den Dries|isbn=9780521598385|publisher=Cambridge University Press|location=Cambridge, New York, and Oakleigh, Victoria }}</ref> Precisely, the theorem holds for a semialgebraic (or definable) map between open subsets of <math>k^n</math> that is continuously differentiable. The usual proof of the IFT uses Banach's fixed point theorem, which relies on the Cauchy completeness. That part of the argument is replaced by the use of the [[extreme value theorem]], which does not need completeness. Explicitly, in {{section link||A_proof_using_the_contraction_mapping_principle}}, the Cauchy completeness is used only to establish the inclusion <math>B(0, r/2) \subset f(B(0, r))</math>. Here, we shall directly show <math>B(0, r/4) \subset f(B(0, r))</math> instead (which is enough). Given a point <math>y</math> in <math>B(0, r/4)</math>, consider the function <math>P(x) = |f(x) - y|^2</math> defined on a neighborhood of <math>\overline{B}(0, r)</math>. If <math>P'(x) = 0</math>, then <math>0 = P'(x) = 2[f_1(x) - y_1 \cdots f_n(x) - y_n]f'(x)</math> and so <math>f(x) = y</math>, since <math>f'(x)</math> is invertible. Now, by the extreme value theorem, <math>P</math> admits a minimal at some point <math>x_0</math> on the closed ball <math>\overline{B}(0, r)</math>, which can be shown to lie in <math>B(0, r)</math> using <math>2^{-1}|x| \le |f(x)|</math>. Since <math>P'(x_0) = 0</math>, <math>f(x_0) = y</math>, which proves the claimed inclusion. <math>\square</math> Alternatively, one can deduce the theorem from the one over real numbers by [[Tarski's principle]].{{citation needed|date=December 2024}} ==See also== *[[Nash–Moser theorem]] ==Notes== {{reflist}} ==References== * {{cite book |first=Carl B. |last=Allendoerfer |author-link=Carl B. Allendoerfer |title=Calculus of Several Variables and Differentiable Manifolds |location=New York |publisher=Macmillan |year=1974 |chapter=Theorems about Differentiable Functions |pages=54–88 |isbn=0-02-301840-2 }} * {{cite book |first1=Peter |last1=Baxandall |author-link=Peter Baxandall |first2=Hans |last2=Liebeck |title=Vector Calculus |location=New York |publisher=Oxford University Press |year=1986 |chapter=The Inverse Function Theorem |isbn=0-19-859652-9 |pages=214–225 }} * {{cite journal |last = Nijenhuis |first = Albert |author-link= Albert Nijenhuis |title = Strong derivatives and inverse mappings |journal = [[The American Mathematical Monthly|Amer. Math. Monthly]] |volume = 81 |year = 1974 |pages = 969–980 |doi = 10.2307/2319298 |issue = 9 |jstor = 2319298 |hdl = 10338.dmlcz/102482 |hdl-access = free }} *{{citation | last1 = Griffiths | first1 = Phillip | last2 = Harris | first2 = Joseph | isbn = 978-0-471-05059-9 | publisher = John Wiley & Sons | title = Principles of Algebraic Geometry | year = 1978}}. * {{cite book |last1=Hirsch |first1=Morris W. |title=Differential Topology |date=1976 |publisher=Springer-Verlag |isbn=978-0-387-90148-0 |url=https://www.researchgate.net/publication/268035774 |language=en}} * {{cite book |first1=Murray H. |last1=Protter |author-link=Murray H. Protter |first2=Charles B. Jr. |last2=Morrey |author-link2=Charles B. Morrey Jr. |title=Intermediate Calculus |location=New York |publisher=Springer |edition=Second |year=1985 |isbn=0-387-96058-9 |chapter=Transformations and Jacobians |pages=412–420 }} * {{cite book |last1=Renardy |first1=Michael |last2=Rogers |first2=Robert C. |title = An Introduction to Partial Differential Equations | series = Texts in Applied Mathematics 13 | edition = Second |publisher = Springer-Verlag | location = New York |year = 2004 |pages = 337–338 |isbn = 0-387-00444-0 }} * {{cite book |last = Rudin|first = Walter|author-link= Walter Rudin|title = Principles of mathematical analysis|url = https://archive.org/details/principlesofmath00rudi|url-access = registration|edition = Third |series = International Series in Pure and Applied Mathematics |publisher = McGraw-Hill Book | location = New York |year = 1976 |pages = [https://archive.org/details/principlesofmath00rudi/page/221 221]–223 | isbn=978-0-07-085613-4 }} * {{cite book |title=Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus |last1=Spivak|first1=Michael|title-link=Calculus on Manifolds (book)|publisher= Benjamin Cummings |year=1965 |isbn=0-8053-9021-9 |location=San Francisco |author1-link=Michael Spivak }} {{Functional analysis}} {{Analysis in topological vector spaces}} [[Category:Multivariable calculus]] [[Category:Differential topology]] [[Category:Inverse functions]] [[Category:Theorems in real analysis]] [[Category:Theorems in calculus]] [[de:Satz von der impliziten Funktion#Satz von der Umkehrabbildung]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Analysis in topological vector spaces
(
edit
)
Template:Bigger
(
edit
)
Template:Calculus
(
edit
)
Template:Citation
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Endflatlist
(
edit
)
Template:Functional analysis
(
edit
)
Template:Harvnb
(
edit
)
Template:Math theorem
(
edit
)
Template:Mvar
(
edit
)
Template:Reflist
(
edit
)
Template:Section link
(
edit
)
Template:Short description
(
edit
)
Template:Startflatlist
(
edit
)
Template:Use dmy dates
(
edit
)