Chain rule

Template:Short description Template:About {{#invoke:sidebar|collapsible | class = plainlist | titlestyle = padding-bottom:0.25em; | pretitle = Part of a series of articles about | title = Calculus | image = <math>\int_{a}^{b} f'(t) \, dt = f(b) - f(a)</math> | listtitlestyle = text-align:center; | liststyle = border-top:1px solid #aaa;padding-top:0.15em;border-bottom:1px solid #aaa; | expanded = differential | abovestyle = padding:0.15em 0.25em 0.3em;font-weight:normal; | above =

| list2name = differential | list2titlestyle = display:block;margin-top:0.65em; | list2title = Template:Bigger | list2 ={{#invoke:sidebar|sidebar|child=yes

 |contentclass=hlist
 | heading1 = Definitions
 | content1 =

 | heading2 = Concepts
 | content2 =

 | heading3 = Rules and identities
 | content3 =

}}

| list3name = integral | list3title = Template:Bigger | list3 ={{#invoke:sidebar|sidebar|child=yes

 |contentclass=hlist
 | content1 =

| heading2 = Definitions

 | content2 =

 | heading3 = Integration by
 | content3 =

}}

| list4name = series | list4title = Template:Bigger | list4 ={{#invoke:sidebar|sidebar|child=yes

 |contentclass=hlist
 | content1 =

 | heading2 = Convergence tests
 | content2 =

}}

| list5name = vector | list5title = Template:Bigger | list5 ={{#invoke:sidebar|sidebar|child=yes

 |contentclass=hlist
 | content1 =

 | heading2 = Theorems
 | content2 =

}}

| list6name = multivariable | list6title = Template:Bigger | list6 ={{#invoke:sidebar|sidebar|child=yes

 |contentclass=hlist
 | heading1 = Formalisms
 | content1 =

 | heading2 = Definitions
 | content2 =

}}

| list7name = advanced | list7title = Template:Bigger | list7 ={{#invoke:sidebar|sidebar|child=yes

 |contentclass=hlist
 | content1 =

}}

| list8name = specialized | list8title = Template:Bigger | list8 =

| list9name = miscellanea | list9title = Template:Bigger | list9 =

}}

In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions Template:Mvar and Template:Mvar in terms of the derivatives of Template:Mvar and Template:Mvar. More precisely, if <math>h=f\circ g</math> is the function such that <math>h(x)=f(g(x))</math> for every Template:Mvar, then the chain rule is, in Lagrange's notation, <math display="block">h'(x) = f'(g(x)) g'(x).</math> or, equivalently, <math display="block">h'=(f\circ g)'=(f'\circ g)\cdot g'.</math>

The chain rule may also be expressed in Leibniz's notation. If a variable Template:Mvar depends on the variable Template:Mvar, which itself depends on the variable Template:Mvar (that is, Template:Mvar and Template:Mvar are dependent variables), then Template:Mvar depends on Template:Mvar as well, via the intermediate variable Template:Mvar. In this case, the chain rule is expressed as <math display="block">\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx},</math> and <math display="block"> \left.\frac{dz}{dx}\right|_{x} = \left.\frac{dz}{dy}\right|_{y(x)} \cdot \left. \frac{dy}{dx}\right|_{x} ,</math> for indicating at which points the derivatives have to be evaluated.

In integration, the counterpart to the chain rule is the substitution rule.

Intuitive explanationEdit

Intuitively, the chain rule states that knowing the instantaneous rate of change of Template:Math relative to Template:Math and that of Template:Math relative to Template:Math allows one to calculate the instantaneous rate of change of Template:Math relative to Template:Math as the product of the two rates of change.

As put by George F. Simmons: "If a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 × 4 = 8 times as fast as the man."<ref>George F. Simmons, Calculus with Analytic Geometry (1985), p. 93.</ref>

The relationship between this example and the chain rule is as follows. Let Template:Mvar, Template:Mvar and Template:Mvar be the (variable) positions of the car, the bicycle, and the walking man, respectively. The rate of change of relative positions of the car and the bicycle is <math DISPLAY = inline>\frac {dz}{dy}=2.</math> Similarly, <math DISPLAY = inline>\frac {dy}{dx}=4.</math> So, the rate of change of the relative positions of the car and the walking man is <math display="block">\frac{dz}{dx}=\frac{dz}{dy}\cdot\frac{dy}{dx}=2\cdot 4=8.</math>

The rate of change of positions is the ratio of the speeds, and the speed is the derivative of the position with respect to the time; that is, <math display="block">\frac{dz}{dx}=\frac \frac{dz}{dt}\frac{dx}{dt},</math> or, equivalently, <math display="block">\frac{dz}{dt}=\frac{dz}{dx}\cdot \frac{dx}{dt},</math> which is also an application of the chain rule.

HistoryEdit

The chain rule seems to have first been used by Gottfried Wilhelm Leibniz. He used it to calculate the derivative of <math>\sqrt{a + bz + cz^2}</math> as the composite of the square root function and the function <math>a + bz + cz^2\!</math>. He first mentioned it in a 1676 memoir (with a sign error in the calculation).<ref>Template:Cite journal</ref> The common notation of the chain rule is due to Leibniz.<ref name="OHR">Template:Cite journal</ref> Guillaume de l'Hôpital used the chain rule implicitly in his Analyse des infiniment petits. The chain rule does not appear in any of Leonhard Euler's analysis books, even though they were written over a hundred years after Leibniz's discovery.Template:Citation needed. It is believed that the first "modern" version of the chain rule appears in Lagrange's 1797 Théorie des fonctions analytiques; it also appears in Cauchy's 1823 Résumé des Leçons données a L’École Royale Polytechnique sur Le Calcul Infinitesimal.<ref name="OHR"/>

StatementEdit

The simplest form of the chain rule is for real-valued functions of one real variable. It states that if Template:Mvar is a function that is differentiable at a point Template:Mvar (i.e. the derivative Template:Math exists) and Template:Mvar is a function that is differentiable at Template:Math, then the composite function <math>f \circ g</math> is differentiable at Template:Mvar, and the derivative is<ref>Template:Cite book</ref> <math display="block"> (f\circ g)'(c) = f'(g(c))\cdot g'(c). </math> The rule is sometimes abbreviated as <math display="block">(f\circ g)' = (f'\circ g) \cdot g'.</math>

If Template:Math and Template:Math, then this abbreviated form is written in Leibniz notation as: <math display="block">\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}.</math>

The points where the derivatives are evaluated may also be stated explicitly: <math display="block">\left.\frac{dy}{dx}\right|_{x=c} = \left.\frac{dy}{du}\right|_{u = g(c)} \cdot \left.\frac{du}{dx}\right|_{x=c}.</math>

Carrying the same reasoning further, given Template:Mvar functions <math>f_1, \ldots, f_n\!</math> with the composite function <math>f_1 \circ ( f_2 \circ \cdots (f_{n-1} \circ f_n) )\!</math>, if each function <math>f_i\!</math> is differentiable at its immediate input, then the composite function is also differentiable by the repeated application of Chain Rule, where the derivative is (in Leibniz's notation): <math display="block">\frac{df_1}{dx} = \frac{df_1}{df_2}\frac{df_2}{df_3}\cdots\frac{df_n}{dx}.</math>

ApplicationsEdit

File:Chain rule en.png

The chain rule in case of composites of more than two functions

Composites of more than two functionsEdit

The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite of Template:Mvar, Template:Mvar, and Template:Mvar (in that order) is the composite of Template:Mvar with Template:Math. The chain rule states that to compute the derivative of Template:Math, it is sufficient to compute the derivative of Template:Mvar and the derivative of Template:Math. The derivative of Template:Mvar can be calculated directly, and the derivative of Template:Math can be calculated by applying the chain rule again.Template:Citation needed

For concreteness, consider the function <math display="block">y = e^{\sin (x^2)}.</math> This can be decomposed as the composite of three functions: <math display="block">\begin{align} y &= f(u) = e^u, \\ u &= g(v) = \sin v, \\ v &= h(x) = x^2. \end{align}</math> So that <math> y = f(g(h(x))) </math>.

Their derivatives are: <math display="block">\begin{align} \frac{dy}{du} &= f'(u) = e^u, \\ \frac{du}{dv} &= g'(v) = \cos v, \\ \frac{dv}{dx} &= h'(x) = 2x. \end{align}</math>

The chain rule states that the derivative of their composite at the point Template:Math is: <math display="block">\begin{align} (f \circ g \circ h)'(a) & = f'((g \circ h)(a)) \cdot (g \circ h)'(a) \\ & = f'((g \circ h)(a)) \cdot g'(h(a)) \cdot h'(a) \\ & = (f' \circ g \circ h)(a) \cdot (g' \circ h)(a) \cdot h'(a). \end{align}</math>

In Leibniz's notation, this is: <math display="block">\frac{dy}{dx} = \left.\frac{dy}{du}\right|_{u=g(h(a))}\cdot\left.\frac{du}{dv}\right|_{v=h(a)}\cdot\left.\frac{dv}{dx}\right|_{x=a},</math> or for short, <math display="block">\frac{dy}{dx} = \frac{dy}{du}\cdot\frac{du}{dv}\cdot\frac{dv}{dx}.</math> The derivative function is therefore: <math display="block">\frac{dy}{dx} = e^{\sin(x^2)}\cdot\cos(x^2)\cdot 2x.</math>

Another way of computing this derivative is to view the composite function Template:Math as the composite of Template:Math and h. Applying the chain rule in this manner would yield: <math display="block">\begin{align} (f \circ g \circ h)'(a) &= (f \circ g)'(h(a)) \cdot h'(a) \\ &= f'(g(h(a))) \cdot g'(h(a)) \cdot h'(a). \end{align}</math>

This is the same as what was computed above. This should be expected because Template:Math.

Sometimes, it is necessary to differentiate an arbitrarily long composition of the form <math>f_1 \circ f_2 \circ \cdots \circ f_{n-1} \circ f_n\!</math>. In this case, define <math display="block">f_{a\,.\,.\,b} = f_{a} \circ f_{a+1} \circ \cdots \circ f_{b-1} \circ f_{b}</math> where <math>f_{a\,.\,.\,a} = f_a</math> and <math>f_{a\,.\,.\,b}(x) = x</math> when <math>b < a</math>. Then the chain rule takes the form <math display="block">\begin{align} Df_{1\,.\,.\,n} &= (Df_1 \circ f_{2\,.\,.\,n}) (Df_2 \circ f_{3\,.\,.\,n}) \cdots (Df_{n-1} \circ f_{n\,.\,.\,n}) Df_n \\ &= \prod_{k=1}^n \left[Df_k \circ f_{(k+1)\,.\,.\,n}\right] \end{align}</math> or, in the Lagrange notation, <math display="block">\begin{align} f_{1\,.\,.\,n}'(x) &= f_1' \left( f_{2\,.\,.\,n}(x) \right) \; f_2' \left( f_{3\,.\,.\,n}(x) \right) \cdots f_{n-1}' \left(f_{n\,.\,.\,n}(x)\right) \; f_n'(x) \\[1ex] &= \prod_{k=1}^{n} f_k' \left(f_{(k+1\,.\,.\,n)}(x) \right) \end{align}</math>

Quotient ruleEdit

Template:See also The chain rule can be used to derive some well-known differentiation rules. For example, the quotient rule is a consequence of the chain rule and the product rule. To see this, write the function Template:Math as the product Template:Math. First apply the product rule: <math display="block">\begin{align} \frac{d}{dx}\left(\frac{f(x)}{g(x)}\right) &= \frac{d}{dx}\left(f(x)\cdot\frac{1}{g(x)}\right) \\ &= f'(x)\cdot\frac{1}{g(x)} + f(x)\cdot\frac{d}{dx}\left(\frac{1}{g(x)}\right). \end{align}</math>

To compute the derivative of Template:Math, notice that it is the composite of Template:Mvar with the reciprocal function, that is, the function that sends Template:Mvar to Template:Math. The derivative of the reciprocal function is <math>-1/x^2\!</math>. By applying the chain rule, the last expression becomes: <math display="block">f'(x)\cdot\frac{1}{g(x)} + f(x)\cdot\left(-\frac{1}{g(x)^2}\cdot g'(x)\right) = \frac{f'(x) g(x) - f(x) g'(x)}{g(x)^2},</math> which is the usual formula for the quotient rule.

Derivatives of inverse functionsEdit

{{#invoke:Labelled list hatnote|labelledList|Main article|Main articles|Main page|Main pages}} Suppose that Template:Math has an inverse function. Call its inverse function Template:Mvar so that we have Template:Math. There is a formula for the derivative of Template:Mvar in terms of the derivative of Template:Mvar. To see this, note that Template:Mvar and Template:Mvar satisfy the formula <math display="block">f(g(x)) = x.</math>

And because the functions <math>f(g(x))</math> and Template:Mvar are equal, their derivatives must be equal. The derivative of Template:Mvar is the constant function with value 1, and the derivative of <math>f(g(x))</math> is determined by the chain rule. Therefore, we have that: <math display="block">f'(g(x)) g'(x) = 1.</math>

To express Template:Mvar as a function of an independent variable Template:Mvar, we substitute <math>f(y)</math> for Template:Mvar wherever it appears. Then we can solve for Template:Mvar. <math display="block">\begin{align} f'(g(f(y))) g'(f(y)) &= 1 \\ f'(y) g'(f(y)) &= 1 \\ f'(y) = \frac{1}{g'(f(y))}. \end{align}</math>

For example, consider the function Template:Math. It has an inverse Template:Math. Because Template:Math, the above formula says that <math display="block">\frac{d}{dy}\ln y = \frac{1}{e^{\ln y}} = \frac{1}{y}.</math>

This formula is true whenever Template:Mvar is differentiable and its inverse Template:Mvar is also differentiable. This formula can fail when one of these conditions is not true. For example, consider Template:Math. Its inverse is Template:Math, which is not differentiable at zero. If we attempt to use the above formula to compute the derivative of Template:Mvar at zero, then we must evaluate Template:Math. Since Template:Math and Template:Math, we must evaluate 1/0, which is undefined. Therefore, the formula fails in this case. This is not surprising because Template:Mvar is not differentiable at zero.

Back propagationEdit

The chain rule forms the basis of the back propagation algorithm, which is used in gradient descent of neural networks in deep learning (artificial intelligence).<ref>Template:Citation, pp=197–217.</ref>

Higher derivativesEdit

Faà di Bruno's formula generalizes the chain rule to higher derivatives. Assuming that Template:Math and Template:Math, then the first few derivatives are: <math display="block"> \begin{align} \frac{dy}{dx} & = \frac{dy}{du} \frac{du}{dx} \\ \frac{d^2 y }{d x^2} & =

     \frac{d^2 y}{d u^2} \left(\frac{du}{dx}\right)^2
   + \frac{dy}{du} \frac{d^2 u}{dx^2} \\

\frac{d^3 y }{d x^3} & =

     \frac{d^3 y}{d u^3} \left(\frac{du}{dx}\right)^3
   + 3 \, \frac{d^2 y}{d u^2} \frac{du}{dx} \frac{d^2 u}{d x^2}
   + \frac{dy}{du} \frac{d^3 u}{d x^3} \\

\frac{d^4 y}{d x^4} & =

     \frac{d^4 y}{du^4} \left(\frac{du}{dx}\right)^4
   + 6 \, \frac{d^3 y}{d u^3} \left(\frac{du}{dx}\right)^2 \frac{d^2 u}{d x^2}
   + \frac{d^2 y}{d u^2} \left( 4 \, \frac{du}{dx} \frac{d^3 u}{dx^3}
   + 3 \, \left(\frac{d^2 u}{dx^2}\right)^2\right)
   + \frac{dy}{du} \frac{d^4 u}{dx^4}.

\end{align}</math>

ProofsEdit

First proofEdit

One proof of the chain rule begins by defining the derivative of the composite function Template:Math, where we take the limit of the difference quotient for Template:Math as Template:Mvar approaches Template:Mvar: <math display="block">(f \circ g)'(a) = \lim_{x \to a} \frac{f(g(x)) - f(g(a))}{x - a}.</math>

Assume for the moment that <math>g(x)\!</math> does not equal <math>g(a)</math> for any <math>x</math> near <math>a</math>. Then the previous expression is equal to the product of two factors: <math display="block">\lim_{x \to a} \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \cdot \frac{g(x) - g(a)}{x - a}.</math>

If <math>g</math> oscillates near Template:Mvar, then it might happen that no matter how close one gets to Template:Mvar, there is always an even closer Template:Mvar such that Template:Math. For example, this happens near Template:Math for the continuous function Template:Mvar defined by Template:Math for Template:Math and Template:Math otherwise. Whenever this happens, the above expression is undefined because it involves division by zero. To work around this, introduce a function <math>Q</math> as follows: <math display="block">Q(y) = \begin{cases} \displaystyle\frac{f(y) - f(g(a))}{y - g(a)}, & y \neq g(a), \\ f'(g(a)), & y = g(a). \end{cases}</math> We will show that the difference quotient for Template:Math is always equal to: <math display="block">Q(g(x)) \cdot \frac{g(x) - g(a)}{x - a}.</math>

Whenever Template:Math is not equal to Template:Math, this is clear because the factors of Template:Math cancel. When Template:Math equals Template:Math, then the difference quotient for Template:Math is zero because Template:Math equals Template:Math, and the above product is zero because it equals Template:Math times zero. So the above product is always equal to the difference quotient, and to show that the derivative of Template:Math at Template:Math exists and to determine its value, we need only show that the limit as Template:Math goes to Template:Math of the above product exists and determine its value.

To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are Template:Math and Template:Math. The latter is the difference quotient for Template:Mvar at Template:Mvar, and because Template:Mvar is differentiable at Template:Mvar by assumption, its limit as Template:Mvar tends to Template:Mvar exists and equals Template:Math.

As for Template:Math, notice that Template:Math is defined wherever Template:Mvar is. Furthermore, Template:Mvar is differentiable at Template:Math by assumption, so Template:Math is continuous at Template:Math, by definition of the derivative. The function Template:Mvar is continuous at Template:Mvar because it is differentiable at Template:Mvar, and therefore Template:Math is continuous at Template:Mvar. So its limit as Template:Mvar goes to Template:Mvar exists and equals Template:Math, which is Template:Math.

This shows that the limits of both factors exist and that they equal Template:Math and Template:Math, respectively. Therefore, the derivative of Template:Math at a exists and equals Template:Math Template:Math.

Second proofEdit

Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function g is differentiable at a if there exists a real number g′(a) and a function ε(h) that tends to zero as h tends to zero, and furthermore <math display="block">g(a + h) - g(a) = g'(a) h + \varepsilon(h) h.</math> Here the left-hand side represents the true difference between the value of g at a and at Template:Math, whereas the right-hand side represents the approximation determined by the derivative plus an error term.

In the situation of the chain rule, such a function ε exists because g is assumed to be differentiable at a. Again by assumption, a similar function also exists for f at g(a). Calling this function η, we have <math display="block">f(g(a) + k) - f(g(a)) = f'(g(a)) k + \eta(k) k.</math> The above definition imposes no constraints on η(0), even though it is assumed that η(k) tends to zero as k tends to zero. If we set Template:Math, then η is continuous at 0.

Proving the theorem requires studying the difference Template:Math as h tends to zero. The first step is to substitute for Template:Math using the definition of differentiability of g at a: <math display="block">f(g(a + h)) - f(g(a)) = f(g(a) + g'(a) h + \varepsilon(h) h) - f(g(a)).</math> The next step is to use the definition of differentiability of f at g(a). This requires a term of the form Template:Math for some k. In the above equation, the correct k varies with h. Set Template:Math and the right hand side becomes Template:Math. Applying the definition of the derivative gives: <math display="block">f(g(a) + k_h) - f(g(a)) = f'(g(a)) k_h + \eta(k_h) k_h.</math> To study the behavior of this expression as h tends to zero, expand k_h. After regrouping the terms, the right-hand side becomes: <math display="block">f'(g(a)) g'(a)h + [f'(g(a)) \varepsilon(h) + \eta(k_h) g'(a) + \eta(k_h) \varepsilon(h)] h.</math> Because ε(h) and η(k_h) tend to zero as h tends to zero, the first two bracketed terms tend to zero as h tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the difference Template:Math, by the definition of the derivative Template:Math is differentiable at a and its derivative is Template:Math

The role of Q in the first proof is played by η in this proof. They are related by the equation: <math display="block">Q(y) = f'(g(a)) + \eta(y - g(a)). </math> The need to define Q at g(a) is analogous to the need to define η at zero.

Third proofEdit

Constantin Carathéodory's alternative definition of the differentiability of a function can be used to give an elegant proof of the chain rule.<ref>Template:Cite journal</ref>

Under this definition, a function Template:Mvar is differentiable at a point Template:Mvar if and only if there is a function Template:Mvar, continuous at Template:Mvar and such that Template:Math. There is at most one such function, and if Template:Mvar is differentiable at Template:Mvar then Template:Math.

Given the assumptions of the chain rule and the fact that differentiable functions and compositions of continuous functions are continuous, we have that there exist functions Template:Mvar, continuous at Template:Math, and Template:Mvar, continuous at Template:Mvar, and such that, <math display="block">f(g(x))-f(g(a))=q(g(x))(g(x)-g(a))</math> and <math display="block">g(x)-g(a)=r(x)(x-a).</math> Therefore, <math display="block">f(g(x))-f(g(a))=q(g(x))r(x)(x-a),</math> but the function given by Template:Math is continuous at Template:Mvar, and we get, for this Template:Mvar <math display="block">(f(g(a)))'=q(g(a))r(a)=f'(g(a))g'(a).</math> A similar approach works for continuously differentiable (vector-)functions of many variables. This method of factoring also allows a unified approach to stronger forms of differentiability, when the derivative is required to be Lipschitz continuous, Hölder continuous, etc. Differentiation itself can be viewed as the polynomial remainder theorem (the little Bézout theorem, or factor theorem), generalized to an appropriate class of functions.Template:Citation needed

Proof via infinitesimalsEdit

Template:See also If <math>y=f(x)</math> and <math>x=g(t)</math> then choosing infinitesimal <math>\Delta t\not=0</math> we compute the corresponding <math>\Delta x=g(t+\Delta t)-g(t)</math> and then the corresponding <math>\Delta y=f(x+\Delta x)-f(x)</math>, so that <math display="block">\frac{\Delta y}{\Delta t} = \frac{\Delta y}{\Delta x} \frac{\Delta x}{\Delta t}</math> and applying the standard part we obtain <math display="block">\frac{d y}{d t}=\frac{d y}{d x} \frac{dx}{dt}</math> which is the chain rule.

Multivariable caseEdit

The full generalization of the chain rule to multi-variable functions (such as <math>f : \mathbb{R}^m \to \mathbb{R}^n</math>) is rather technical. However, it is simpler to write in the case of functions of the form <math display="block">f(g_1(x), \dots, g_k(x)),</math> where <math>f : \reals^k \to \reals</math>, and <math>g_i : \mathbb{R} \to \mathbb{R}</math> for each <math>i = 1, 2, \dots, k.</math>

As this case occurs often in the study of functions of a single variable, it is worth describing it separately.

Case of scalar-valued functions with multiple inputsEdit

Let <math>f : \reals^k \to \reals</math>, and <math>g_i : \mathbb{R} \to \mathbb{R}</math> for each <math>i = 1, 2, \dots, k.</math> To write the chain rule for the composition of functions <math display="block">x \mapsto f(g_1(x), \dots , g_k(x)),</math> one needs the partial derivatives of Template:Mvar with respect to its Template:Mvar arguments. The usual notations for partial derivatives involve names for the arguments of the function. As these arguments are not named in the above formula, it is simpler and clearer to use D-Notation, and to denote by <math display="block">D_i f</math> the partial derivative of Template:Mvar with respect to its Template:Mvarth argument, and by <math display="block">D_i f(z)</math> the value of this derivative at Template:Mvar.

With this notation, the chain rule is <math display="block">\frac{d}{dx}f(g_1(x), \dots, g_k (x))=\sum_{i=1}^k \left(\frac{d}{dx}{g_i}(x)\right) D_i f(g_1(x), \dots, g_k (x)).</math>

Example: arithmetic operationsEdit

If the function Template:Mvar is addition, that is, if <math display="block">f(u,v)=u+v,</math> then <math display="inline">D_1 f = \frac{\partial f}{\partial u} = 1</math> and <math display="inline">D_2 f = \frac{\partial f}{\partial v} = 1</math>. Thus, the chain rule gives <math display="block">\frac{d}{dx}(g(x)+h(x)) = \left( \frac{d}{dx}g(x) \right) D_1 f+\left( \frac{d}{dx}h(x)\right) D_2 f=\frac{d}{dx}g(x) +\frac{d}{dx}h(x).</math>

For multiplication <math display="block">f(u,v)=uv,</math> the partials are <math>D_1 f = v</math> and <math>D_2 f = u</math>. Thus, <math display="block">\frac{d}{dx}(g(x)h(x)) = h(x) \frac{d}{dx} g(x) + g(x) \frac{d}{dx} h(x).</math>

The case of exponentiation <math display="block">f(u,v)=u^v</math> is slightly more complicated, as <math display="block">D_1 f = vu^{v-1},</math> and, as <math>u^v=e^{v\ln u},</math> <math display="block">D_2 f = u^v\ln u.</math> It follows that <math display="block">\frac{d}{dx}\left(g(x)^{h(x)}\right) = h(x)g(x)^{h(x)-1} \frac{d}{dx}g(x) + g(x)^{h(x)} \ln g(x) \,\frac{d}{dx}h(x).</math>

General rule: Vector-valued functions with multiple inputsEdit

The simplest way for writing the chain rule in the general case is to use the total derivative, which is a linear transformation that captures all directional derivatives in a single formula. Consider differentiable functions Template:Math and Template:Math, and a point Template:Math in Template:Math. Let Template:Math denote the total derivative of Template:Math at Template:Math and Template:Math denote the total derivative of Template:Math at Template:Math. These two derivatives are linear transformations Template:Math and Template:Math, respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative of Template:Math at Template:Math: <math display="block">D_{\mathbf{a}}(f \circ g) = D_{g(\mathbf{a})}f \circ D_{\mathbf{a}}g,</math> or for short, <math display="block">D(f \circ g) = Df \circ Dg.</math> The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.<ref name="spivak_manifolds">Template:Cite book</ref>

Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says: <math display="block">J_{f \circ g}(\mathbf{a}) = J_{f}(g(\mathbf{a})) J_{g}(\mathbf{a}),</math> or for short, <math display="block">J_{f \circ g} = (J_f \circ g)J_g.</math>

That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points).

The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If Template:Mvar, Template:Mvar, and Template:Mvar are 1, so that Template:Math and Template:Math, then the Jacobian matrices of Template:Math and Template:Math are Template:Math. Specifically, they are: <math display="block">\begin{align} J_g(a) &= \begin{pmatrix} g'(a) \end{pmatrix}, \\ J_{f}(g(a)) &= \begin{pmatrix} f'(g(a)) \end{pmatrix}. \end{align}</math> The Jacobian of Template:Math is the product of these Template:Math matrices, so it is Template:Math, as expected from the one-dimensional chain rule. In the language of linear transformations, Template:Math is the function which scales a vector by a factor of Template:Math and Template:Math is the function which scales a vector by a factor of Template:Math. The chain rule says that the composite of these two linear transformations is the linear transformation Template:Math, and therefore it is the function that scales a vector by Template:Math.

Another way of writing the chain rule is used when f and g are expressed in terms of their components as Template:Math and Template:Math. In this case, the above rule for Jacobian matrices is usually written as: <math display="block">\frac{\partial(y_1, \ldots, y_k)}{\partial(x_1, \ldots, x_n)} = \frac{\partial(y_1, \ldots, y_k)}{\partial(u_1, \ldots, u_m)} \frac{\partial(u_1, \ldots, u_m)}{\partial(x_1, \ldots, x_n)}.</math>

The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the Template:Mvar-th coordinate direction is found by multiplying the Jacobian matrix by the Template:Mvar-th basis vector. By doing this to the formula above, we find: <math display="block">\frac{\partial(y_1, \ldots, y_k)}{\partial x_i} = \frac{\partial(y_1, \ldots, y_k)}{\partial(u_1, \ldots, u_m)} \frac{\partial(u_1, \ldots, u_m)}{\partial x_i}.</math> Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get: <math display="block">\frac{\partial(y_1, \ldots, y_k)}{\partial x_i} = \sum_{\ell = 1}^m \frac{\partial(y_1, \ldots, y_k)}{\partial u_\ell} \frac{\partial u_\ell}{\partial x_i}.</math> More conceptually, this rule expresses the fact that a change in the Template:Math direction may change all of Template:Math through Template:Math, and any of these changes may affect Template:Math.

In the special case where Template:Math, so that Template:Math is a real-valued function, then this formula simplifies even further: <math display="block">\frac{\partial y}{\partial x_i} = \sum_{\ell = 1}^m \frac{\partial y}{\partial u_\ell} \frac{\partial u_\ell}{\partial x_i}.</math> This can be rewritten as a dot product. Recalling that Template:Math, the partial derivative Template:Math is also a vector, and the chain rule says that: <math display="block">\frac{\partial y}{\partial x_i} = \nabla y \cdot \frac{\partial \mathbf{u}}{\partial x_i}.</math>

ExampleEdit

Given Template:Math where Template:Math and Template:Math, determine the value of Template:Math and Template:Math using the chain rule.Template:Citation needed <math display="block">\frac{\partial u}{\partial r}=\frac{\partial u}{\partial x} \frac{\partial x}{\partial r}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial r} = (2x)(\sin(t)) + (2)(0) = 2r \sin^2(t),</math> and <math display="block">\begin{align} \frac{\partial u}{\partial t} &= \frac{\partial u}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial t} \\ &= (2x)(r\cos(t)) + (2)(2\sin(t)\cos(t)) \\ &= (2r\sin(t))(r\cos(t)) + 4\sin(t)\cos(t) \\ &= 2(r^2 + 2) \sin(t)\cos(t) \\ &= (r^2 + 2) \sin(2t). \end{align}</math>

Higher derivatives of multivariable functionsEdit

{{#invoke:Labelled list hatnote|labelledList|Main article|Main articles|Main page|Main pages}} Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If Template:Math is a function of Template:Math as above, then the second derivative of Template:Math is: <math display="block">\frac{\partial^2 y}{\partial x_i \partial x_j} = \sum_k \left(\frac{\partial y}{\partial u_k}\frac{\partial^2 u_k}{\partial x_i \partial x_j}\right) + \sum_{k, \ell} \left(\frac{\partial^2 y}{\partial u_k \partial u_\ell}\frac{\partial u_k}{\partial x_i}\frac{\partial u_\ell}{\partial x_j}\right).</math>

Further generalizationsEdit

All extensions of calculus have a chain rule. In most of these, the formula remains the same, though the meaning of that formula may be vastly different.

One generalization is to manifolds. In this situation, the chain rule represents the fact that the derivative of Template:Math is the composite of the derivative of Template:Math and the derivative of Template:Math. This theorem is an immediate consequence of the higher dimensional chain rule given above, and it has exactly the same formula.

The chain rule is also valid for Fréchet derivatives in Banach spaces. The same formula holds as before.<ref>Template:Cite book</ref> This case and the previous one admit a simultaneous generalization to Banach manifolds.

In differential algebra, the derivative is interpreted as a morphism of modules of Kähler differentials. A ring homomorphism of commutative rings Template:Math determines a morphism of Kähler differentials Template:Math which sends an element Template:Math to Template:Math, the exterior differential of Template:Math. The formula Template:Math holds in this context as well.

The common feature of these examples is that they are expressions of the idea that the derivative is part of a functor. A functor is an operation on spaces and functions between them. It associates to each space a new space and to each function between two spaces a new function between the corresponding new spaces. In each of the above cases, the functor sends each space to its tangent bundle and it sends each function to its derivative. For example, in the manifold case, the derivative sends a Template:Math-manifold to a Template:Math-manifold (its tangent bundle) and a Template:Math-function to its total derivative. There is one requirement for this to be a functor, namely that the derivative of a composite must be the composite of the derivatives. This is exactly the formula Template:Math.

There are also chain rules in stochastic calculus. One of these, Itō's lemma, expresses the composite of an Itō process (or more generally a semimartingale) dX_t with a twice-differentiable function f. In Itō's lemma, the derivative of the composite function depends not only on dX_t and the derivative of f but also on the second derivative of f. The dependence on the second derivative is a consequence of the non-zero quadratic variation of the stochastic process, which broadly speaking means that the process can move up and down in a very rough way. This variant of the chain rule is not an example of a functor because the two functions being composed are of different types.

ReferencesEdit

Template:Reflist

External linksEdit

Template:Springer
{{#invoke:Template wrapper|{{#if:|list|wrap}}|_template=cite web

|_exclude=urlname, _debug, id |url = https://mathworld.wolfram.com/{{#if:ChainRule%7CChainRule.html}} |title = Chain Rule |author = Weisstein, Eric W. |website = MathWorld |access-date = |ref = Template:SfnRef }}

Template:Calculus topics