Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chain rule
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===General rule: Vector-valued functions with multiple inputs=== The simplest way for writing the chain rule in the general case is to use the [[Total derivative#The total derivative as a linear map|total derivative]], which is a linear transformation that captures all [[directional derivative]]s in a single formula. Consider differentiable functions {{math|''f'' : '''R'''<sup>''m''</sup> β '''R'''<sup>''k''</sup>}} and {{math|''g'' : '''R'''<sup>''n''</sup> β '''R'''<sup>''m''</sup>}}, and a point {{math|'''a'''}} in {{math|'''R'''<sup>''n''</sup>}}. Let {{math|''D''<sub>'''a'''</sub> ''g''}} denote the total derivative of {{math|''g''}} at {{math|'''a'''}} and {{math|''D''<sub>''g''('''a''')</sub> ''f''}} denote the total derivative of {{math|''f''}} at {{math|''g''('''a''')}}. These two derivatives are linear transformations {{math|'''R'''<sup>''n''</sup> β '''R'''<sup>''m''</sup>}} and {{math|'''R'''<sup>''m''</sup> β '''R'''<sup>''k''</sup>}}, respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative of {{math|''f'' β ''g''}} at {{math|'''a'''}}: <math display="block">D_{\mathbf{a}}(f \circ g) = D_{g(\mathbf{a})}f \circ D_{\mathbf{a}}g,</math> or for short, <math display="block">D(f \circ g) = Df \circ Dg.</math> The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.<ref name="spivak_manifolds">{{cite book |first=Michael |last=Spivak |author-link=Michael Spivak |title=[[Calculus on Manifolds (book)|Calculus on Manifolds]] |location=Boston |publisher=Addison-Wesley |year=1965 |isbn=0-8053-9021-9 |pages=19β20 }}</ref> Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called a [[Jacobian matrix]], and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says: <math display="block">J_{f \circ g}(\mathbf{a}) = J_{f}(g(\mathbf{a})) J_{g}(\mathbf{a}),</math> or for short, <math display="block">J_{f \circ g} = (J_f \circ g)J_g.</math> That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points). The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If {{mvar|k}}, {{mvar|m}}, and {{mvar|n}} are 1, so that {{math|''f'' : '''R''' β '''R'''}} and {{math|''g'' : '''R''' β '''R'''}}, then the Jacobian matrices of {{math|''f''}} and {{math|''g''}} are {{math|1 Γ 1}}. Specifically, they are: <math display="block">\begin{align} J_g(a) &= \begin{pmatrix} g'(a) \end{pmatrix}, \\ J_{f}(g(a)) &= \begin{pmatrix} f'(g(a)) \end{pmatrix}. \end{align}</math> The Jacobian of {{math|''f'' β ''g''}} is the product of these {{math|1 Γ 1}} matrices, so it is {{math|''f''β²(''g''(''a''))β ''g''β²(''a'')}}, as expected from the one-dimensional chain rule. In the language of linear transformations, {{math|''D''<sub>''a''</sub>(''g'')}} is the function which scales a vector by a factor of {{math|''g''β²(''a'')}} and {{math|''D''<sub>''g''(''a'')</sub>(''f'')}} is the function which scales a vector by a factor of {{math|''f''β²(''g''(''a''))}}. The chain rule says that the composite of these two linear transformations is the linear transformation {{math|''D''<sub>''a''</sub>(''f'' β ''g'')}}, and therefore it is the function that scales a vector by {{math|''f''β²(''g''(''a''))β ''g''β²(''a'')}}. Another way of writing the chain rule is used when ''f'' and ''g'' are expressed in terms of their components as {{math|1='''y''' = ''f''('''u''') = (''f''<sub>1</sub>('''u'''), β¦, ''f''<sub>''k''</sub>('''u'''))}} and {{math|1='''u''' = ''g''('''x''') = (''g''<sub>1</sub>('''x'''), β¦, ''g''<sub>''m''</sub>('''x'''))}}. In this case, the above rule for Jacobian matrices is usually written as: <math display="block">\frac{\partial(y_1, \ldots, y_k)}{\partial(x_1, \ldots, x_n)} = \frac{\partial(y_1, \ldots, y_k)}{\partial(u_1, \ldots, u_m)} \frac{\partial(u_1, \ldots, u_m)}{\partial(x_1, \ldots, x_n)}.</math> The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the {{mvar|i}}-th coordinate direction is found by multiplying the Jacobian matrix by the {{mvar|i}}-th basis vector. By doing this to the formula above, we find: <math display="block">\frac{\partial(y_1, \ldots, y_k)}{\partial x_i} = \frac{\partial(y_1, \ldots, y_k)}{\partial(u_1, \ldots, u_m)} \frac{\partial(u_1, \ldots, u_m)}{\partial x_i}.</math> Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get: <math display="block">\frac{\partial(y_1, \ldots, y_k)}{\partial x_i} = \sum_{\ell = 1}^m \frac{\partial(y_1, \ldots, y_k)}{\partial u_\ell} \frac{\partial u_\ell}{\partial x_i}.</math> More conceptually, this rule expresses the fact that a change in the {{math|''x''<sub>''i''</sub>}} direction may change all of {{math|''g''<sub>1</sub>}} through {{math|''g<sub>m</sub>''}}, and any of these changes may affect {{math|''f''}}. In the special case where {{math|1=''k'' = 1}}, so that {{math|''f''}} is a real-valued function, then this formula simplifies even further: <math display="block">\frac{\partial y}{\partial x_i} = \sum_{\ell = 1}^m \frac{\partial y}{\partial u_\ell} \frac{\partial u_\ell}{\partial x_i}.</math> This can be rewritten as a [[dot product]]. Recalling that {{math|'''u''' {{=}} (''g''<sub>1</sub>, β¦, ''g''<sub>''m''</sub>)}}, the partial derivative {{math|β'''u''' / β''x''<sub>''i''</sub>}} is also a vector, and the chain rule says that: <math display="block">\frac{\partial y}{\partial x_i} = \nabla y \cdot \frac{\partial \mathbf{u}}{\partial x_i}.</math> ==== Example ==== Given {{math|1=''u''(''x'', ''y'') = ''x''<sup>2</sup> + 2''y''}} where {{math|1=''x''(''r'', ''t'') = ''r'' sin(''t'')}} and {{math|1=''y''(''r'',''t'') = sin<sup>2</sup>(''t'')}}, determine the value of {{math|β''u'' / β''r''}} and {{math|β''u'' / β''t''}} using the chain rule.{{citation needed|date=November 2023}} <math display="block">\frac{\partial u}{\partial r}=\frac{\partial u}{\partial x} \frac{\partial x}{\partial r}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial r} = (2x)(\sin(t)) + (2)(0) = 2r \sin^2(t),</math> and <math display="block">\begin{align} \frac{\partial u}{\partial t} &= \frac{\partial u}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial t} \\ &= (2x)(r\cos(t)) + (2)(2\sin(t)\cos(t)) \\ &= (2r\sin(t))(r\cos(t)) + 4\sin(t)\cos(t) \\ &= 2(r^2 + 2) \sin(t)\cos(t) \\ &= (r^2 + 2) \sin(2t). \end{align}</math> ==== Higher derivatives of multivariable functions ==== {{Main|FaΓ di Bruno's formula#Multivariate version}} FaΓ di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If {{math|1=''y'' = ''f''('''u''')}} is a function of {{math|1='''u''' = ''g''('''x''')}} as above, then the second derivative of {{math|''f'' β ''g''}} is: <math display="block">\frac{\partial^2 y}{\partial x_i \partial x_j} = \sum_k \left(\frac{\partial y}{\partial u_k}\frac{\partial^2 u_k}{\partial x_i \partial x_j}\right) + \sum_{k, \ell} \left(\frac{\partial^2 y}{\partial u_k \partial u_\ell}\frac{\partial u_k}{\partial x_i}\frac{\partial u_\ell}{\partial x_j}\right).</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)