Editing Chain rule (section)

== Proofs ==

=== First proof ===
One proof of the chain rule begins by defining the derivative of the composite function {{math|''f'' ∘ ''g''}}, where we take the [[Limit of a function|limit]] of the [[difference quotient]] for {{math|''f'' ∘ ''g''}} as {{mvar|x}} approaches {{mvar|a}}:
<math display="block">(f \circ g)'(a) = \lim_{x \to a} \frac{f(g(x)) - f(g(a))}{x - a}.</math>

Assume for the moment that <math>g(x)\!</math> does not equal <math>g(a)</math> for any <math>x</math> near <math>a</math>. Then the previous expression is equal to the product of two factors:
<math display="block">\lim_{x \to a} \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \cdot \frac{g(x) - g(a)}{x - a}.</math>

If <math>g</math> oscillates near {{mvar|a}}, then it might happen that no matter how close one gets to {{mvar|a}}, there is always an even closer {{mvar|x}} such that {{math|1=''g''(''x'') = ''g''(''a'')}}. For example, this happens near {{math|1=''a'' = 0}} for the [[continuous function]] {{mvar|g}} defined by {{math|1=''g''(''x'') = 0}} for {{math|1=''x'' = 0}} and {{math|1=''g''(''x'') = ''x''<sup>2</sup> sin(1/''x'')}} otherwise. Whenever this happens, the above expression is undefined because it involves [[division by zero]]. To work around this, introduce a function <math>Q</math> as follows:
<math display="block">Q(y) = \begin{cases}
\displaystyle\frac{f(y) - f(g(a))}{y - g(a)}, & y \neq g(a), \\
f'(g(a)), & y = g(a).
\end{cases}</math>
We will show that the difference quotient for {{math|''f'' ∘ ''g''}} is always equal to:
<math display="block">Q(g(x)) \cdot \frac{g(x) - g(a)}{x - a}.</math>

Whenever {{math|''g''(''x'')}} is not equal to {{math|''g''(''a'')}}, this is clear because the factors of {{math|''g''(''x'') − ''g''(''a'')}} cancel. When {{math|''g''(''x'')}} equals {{math|''g''(''a'')}}, then the difference quotient for {{math|''f'' ∘ ''g''}} is zero because {{math|''f''(''g''(''x''))}} equals {{math|''f''(''g''(''a''))}}, and the above product is zero because it equals {{math|''f''′(''g''(''a''))}} times zero. So the above product is always equal to the difference quotient, and to show that the derivative of {{math|''f'' ∘ ''g''}} at {{math|''a''}} exists and to determine its value, we need only show that the limit as {{math|''x''}} goes to {{math|''a''}} of the above product exists and determine its value.

To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are {{math|''Q''(''g''(''x''))}} and {{math|(''g''(''x'') − ''g''(''a'')) / (''x'' − ''a'')}}. The latter is the difference quotient for {{mvar|g}} at {{mvar|a}}, and because {{mvar|g}} is differentiable at {{mvar|a}} by assumption, its limit as {{mvar|x}} tends to {{mvar|a}} exists and equals {{math|''g''′(''a'')}}.

As for {{math|''Q''(''g''(''x''))}}, notice that {{math|''Q''}} is defined wherever ''{{mvar|f}}'' is. Furthermore, ''{{mvar|f}}'' is differentiable at {{math|''g''(''a'')}} by assumption, so {{math|''Q''}} is continuous at {{math|''g''(''a'')}}, by definition of the derivative. The function {{mvar|g}} is continuous at {{mvar|a}} because it is differentiable at {{mvar|a}}, and therefore {{math|''Q'' ∘ ''g''}} is continuous at {{mvar|a}}. So its limit as ''{{mvar|x}}'' goes to ''{{mvar|a}}'' exists and equals {{math|''Q''(''g''(''a''))}}, which is {{math|''f''′(''g''(''a''))}}.

This shows that the limits of both factors exist and that they equal {{math|''f''′(''g''(''a''))}} and {{math|''g''′(''a'')}}, respectively. Therefore, the derivative of {{math|''f'' ∘ ''g''}} at ''a'' exists and equals {{math|''f''′(''g''(''a''))}}{{math|''g''′(''a'')}}.

=== Second proof ===
Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function ''g'' is differentiable at ''a'' if there exists a real number ''g''′(''a'') and a function ''ε''(''h'') that tends to zero as ''h'' tends to zero, and furthermore
<math display="block">g(a + h) - g(a) = g'(a) h + \varepsilon(h) h.</math>
Here the left-hand side represents the true difference between the value of ''g'' at ''a'' and at {{math|''a'' + ''h''}}, whereas the right-hand side represents the approximation determined by the derivative plus an error term.

In the situation of the chain rule, such a function ''ε'' exists because ''g'' is assumed to be differentiable at ''a''. Again by assumption, a similar function also exists for ''f'' at ''g''(''a''). Calling this function ''η'', we have
<math display="block">f(g(a) + k) - f(g(a)) = f'(g(a)) k + \eta(k) k.</math>
The above definition imposes no constraints on ''η''(0), even though it is assumed that ''η''(''k'') tends to zero as ''k'' tends to zero. If we set {{math|1=''η''(0) = 0}}, then ''η'' is continuous at 0.

Proving the theorem requires studying the difference {{math|''f''(''g''(''a'' + ''h'')) − ''f''(''g''(''a''))}} as ''h'' tends to zero. The first step is to substitute for {{math|''g''(''a'' + ''h'')}} using the definition of differentiability of ''g'' at ''a'':
<math display="block">f(g(a + h)) - f(g(a)) = f(g(a) + g'(a) h + \varepsilon(h) h) - f(g(a)).</math>
The next step is to use the definition of differentiability of ''f'' at ''g''(''a''). This requires a term of the form {{math|''f''(''g''(''a'') + ''k'')}} for some ''k''. In the above equation, the correct ''k'' varies with ''h''. Set {{math|1=''k''<sub>''h''</sub> = ''g''′(''a'') ''h'' + ''ε''(''h'') ''h''}} and the right hand side becomes {{math|''f''(''g''(''a'') + ''k''<sub>''h''</sub>) − ''f''(''g''(''a''))}}. Applying the definition of the derivative gives:
<math display="block">f(g(a) + k_h) - f(g(a)) = f'(g(a)) k_h + \eta(k_h) k_h.</math>
To study the behavior of this expression as ''h'' tends to zero, expand ''k''<sub>''h''</sub>. After regrouping the terms, the right-hand side becomes:
<math display="block">f'(g(a)) g'(a)h + [f'(g(a)) \varepsilon(h) + \eta(k_h) g'(a) + \eta(k_h) \varepsilon(h)] h.</math>
Because ''ε''(''h'') and ''η''(''k''<sub>''h''</sub>) tend to zero as ''h'' tends to zero, the first two bracketed terms tend to zero as ''h'' tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the difference {{math|''f''(''g''(''a'' + ''h'')) − ''f''(''g''(''a''))}}, by the definition of the derivative {{math|''f'' ∘ ''g''}} is differentiable at ''a'' and its derivative is {{math|''f''′(''g''(''a'')) ''g''′(''a'').}}

The role of ''Q'' in the first proof is played by ''η'' in this proof. They are related by the equation:
<math display="block">Q(y) = f'(g(a)) + \eta(y - g(a)). </math>
The need to define ''Q'' at ''g''(''a'') is analogous to the need to define ''η'' at zero.

=== Third proof ===
[[Constantin Carathéodory]]'s alternative definition of the differentiability of a function can be used to give an elegant proof of the chain rule.<ref>{{cite journal|first=Stephen|last=Kuhn|title=The Derivative á la Carathéodory|journal=[[The American Mathematical Monthly]]|year=1991|volume=98|issue=1|pages=40–44|doi=10.2307/2324035|jstor=2324035}}</ref>

Under this definition, a function {{mvar|f}} is differentiable at a point {{mvar|a}} if and only if there is a function {{mvar|q}}, continuous at {{mvar|a}} and such that {{math|1=''f''(''x'') − ''f''(''a'') = ''q''(''x'')(''x'' − ''a'')}}. There is at most one such function, and if {{mvar|f}} is differentiable at {{mvar|a}} then {{math|1=''f'' ′(''a'') = ''q''(''a'')}}.

Given the assumptions of the chain rule and the fact that differentiable functions and compositions of continuous functions are continuous, we have that there exist functions {{mvar|q}}, continuous at {{math|''g''(''a'')}}, and {{mvar|r}}, continuous at {{mvar|a}}, and such that,
<math display="block">f(g(x))-f(g(a))=q(g(x))(g(x)-g(a))</math>
and
<math display="block">g(x)-g(a)=r(x)(x-a).</math>
Therefore,
<math display="block">f(g(x))-f(g(a))=q(g(x))r(x)(x-a),</math>
but the function given by {{math|1=''h''(''x'') = ''q''(''g''(''x''))''r''(''x'')}} is continuous at {{mvar|a}}, and we get, for this {{mvar|a}}
<math display="block">(f(g(a)))'=q(g(a))r(a)=f'(g(a))g'(a).</math>
A similar approach works for continuously differentiable (vector-)functions of many variables. This method of factoring also allows a unified approach to stronger forms of differentiability, when the derivative is required to be [[Lipschitz continuity|Lipschitz continuous]], [[Hölder condition|Hölder continuous]], etc. Differentiation itself can be viewed as the [[polynomial remainder theorem]] (the little [[Étienne Bézout|Bézout]] theorem, or factor theorem), generalized to an appropriate class of functions.{{ citation needed|date=February 2016}}

=== Proof via infinitesimals ===
{{See also|Non-standard calculus}}
If <math>y=f(x)</math> and <math>x=g(t)</math> then choosing infinitesimal <math>\Delta t\not=0</math> we compute the corresponding <math>\Delta x=g(t+\Delta t)-g(t)</math> and then the corresponding <math>\Delta y=f(x+\Delta x)-f(x)</math>, so that
<math display="block">\frac{\Delta y}{\Delta t} = \frac{\Delta y}{\Delta x} \frac{\Delta x}{\Delta t}</math>
and applying the [[standard part]] we obtain
<math display="block">\frac{d y}{d t}=\frac{d y}{d x} \frac{dx}{dt}</math>
which is the chain rule.