Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Gradient
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Relationship with derivative{{anchor|Derivative}}== {{Calculus|Vector}} ===Relationship with total derivative{{anchor|Total derivative}}=== The gradient is closely related to the [[total derivative]] ([[total differential]]) <math>df</math>: they are [[transpose]] ([[Transpose of a linear map|dual]]) to each other. Using the convention that vectors in <math>\R^n</math> are represented by [[column vector]]s, and that covectors (linear maps <math>\R^n \to \R</math>) are represented by [[row vector]]s,{{efn|name=row-column}} the gradient <math>\nabla f</math> and the derivative <math>df</math> are expressed as a column and row vector, respectively, with the same components, but transpose of each other: <math display="block">\nabla f(p) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) \\ \vdots \\ \frac{\partial f}{\partial x_n}(p) \end{bmatrix} ;</math> <math display="block">df_p = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) & \cdots & \frac{\partial f}{\partial x_n}(p) \end{bmatrix} .</math> While these both have the same components, they differ in what kind of mathematical object they represent: at each point, the derivative is a [[cotangent vector]], a [[linear form]] (or covector) which expresses how much the (scalar) output changes for a given infinitesimal change in (vector) input, while at each point, the gradient is a [[tangent vector]], which represents an infinitesimal change in (vector) input. In symbols, the gradient is an element of the tangent space at a point, <math>\nabla f(p) \in T_p \R^n</math>, while the derivative is a map from the tangent space to the real numbers, <math>df_p \colon T_p \R^n \to \R</math>. The tangent spaces at each point of <math>\R^n</math> can be "naturally" identified{{efn|Informally, "naturally" identified means that this can be done without making any arbitrary choices. This can be formalized with a [[natural transformation]].}} with the vector space <math>\R^n</math> itself, and similarly the cotangent space at each point can be naturally identified with the [[dual vector space]] <math>(\R^n)^*</math> of covectors; thus the value of the gradient at a point can be thought of a vector in the original <math>\R^n</math>, not just as a tangent vector. Computationally, given a tangent vector, the vector can be ''multiplied'' by the derivative (as matrices), which is equal to taking the [[dot product]] with the gradient: <math display="block"> (df_p)(v) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) & \cdots & \frac{\partial f}{\partial x_n}(p) \end{bmatrix} \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix} = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(p) v_i = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) \\ \vdots \\ \frac{\partial f}{\partial x_n}(p) \end{bmatrix} \cdot \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix} = \nabla f(p) \cdot v</math> ====Differential or (exterior) derivative==== The best linear approximation to a differentiable function <math display="block">f : \R^n \to \R</math> at a point <math>x</math> in <math>\R^n</math> is a linear map from <math>\R^n</math> to <math>\R</math> which is often denoted by <math>df_x</math> or <math>Df(x)</math> and called the [[differential (calculus)|differential]] or [[total derivative]] of <math>f</math> at <math>x</math>. The function <math>df</math>, which maps <math>x</math> to <math>df_x</math>, is called the [[total differential]] or [[exterior derivative]] of <math>f</math> and is an example of a [[differential 1-form]]. Much as the derivative of a function of a single variable represents the [[slope]] of the [[tangent]] to the [[graph of a function|graph]] of the function,<ref>{{harvtxt|Protter|Morrey|1970|pp=21,88}}</ref> the directional derivative of a function in several variables represents the slope of the tangent [[hyperplane]] in the direction of the vector. The gradient is related to the differential by the formula <math display="block">(\nabla f)_x\cdot v = df_x(v)</math> for any <math>v\in\R^n</math>, where <math>\cdot</math> is the [[dot product]]: taking the dot product of a vector with the gradient is the same as taking the directional derivative along the vector. If <math>\R^n</math> is viewed as the space of (dimension <math>n</math>) column vectors (of real numbers), then one can regard <math>df</math> as the row vector with components <math display="block">\left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n}\right),</math> so that <math>df_x(v)</math> is given by [[matrix multiplication]]. Assuming the standard Euclidean metric on <math>\R^n</math>, the gradient is then the corresponding column vector, that is, <math display="block">(\nabla f)_i = df^\mathsf{T}_i.</math> ====Linear approximation to a function==== The best [[linear approximation]] to a function can be expressed in terms of the gradient, rather than the derivative. The gradient of a [[function (mathematics)|function]] <math>f</math> from the Euclidean space <math>\R^n</math> to <math>\R</math> at any particular point <math>x_0</math> in <math>\R^n</math> characterizes the best [[linear approximation]] to <math>f</math> at <math>x_0</math>. The approximation is as follows: <math display="block">f(x) \approx f(x_0) + (\nabla f)_{x_0}\cdot(x-x_0)</math> for <math>x</math> close to <math>x_0</math>, where <math>(\nabla f)_{x_0}</math> is the gradient of <math>f</math> computed at <math>x_0</math>, and the dot denotes the dot product on <math>\R^n</math>. This equation is equivalent to the first two terms in the [[Taylor series#Taylor series in several variables|multivariable Taylor series]] expansion of <math>f</math> at <math>x_0</math>. ===Relationship with {{vanchor|Fréchet derivative}}=== Let {{math|''U''}} be an [[open set]] in {{math|'''R'''<sup>''n''</sup>}}. If the function {{math|''f'' : ''U'' → '''R'''}} is differentiable, then the differential of {{math|''f''}} is the [[Fréchet derivative]] of {{math|''f''}}. Thus {{math|∇''f''}} is a function from {{math|''U''}} to the space {{math|'''R'''<sup>''n''</sup>}} such that <math display="block">\lim_{h\to 0} \frac{|f(x+h)-f(x) -\nabla f(x)\cdot h|}{\|h\|} = 0,</math> where · is the dot product. As a consequence, the usual properties of the derivative hold for the gradient, though the gradient is not a derivative itself, but rather dual to the derivative: ;[[Linearity]] :The gradient is linear in the sense that if {{math|''f''}} and {{math|''g''}} are two real-valued functions differentiable at the point {{math|''a'' ∈ '''R'''<sup>''n''</sup>}}, and {{mvar|α}} and {{mvar|β}} are two constants, then {{math|''αf'' + ''βg''}} is differentiable at {{math|''a''}}, and moreover <math display="block">\nabla\left(\alpha f+\beta g\right)(a) = \alpha \nabla f(a) + \beta\nabla g (a).</math> ;[[Product rule]] :If {{math|''f''}} and {{math|''g''}} are real-valued functions differentiable at a point {{math|''a'' ∈ '''R'''<sup>''n''</sup>}}, then the product rule asserts that the product {{math|''fg''}} is differentiable at {{math|''a''}}, and <math display="block">\nabla (fg)(a) = f(a)\nabla g(a) + g(a)\nabla f(a).</math> ;[[Chain rule]] :Suppose that {{math|''f'' : ''A'' → '''R'''}} is a real-valued function defined on a subset {{math|''A''}} of {{math|'''R'''<sup>''n''</sup>}}, and that {{math|''f''}} is differentiable at a point {{math|''a''}}. There are two forms of the chain rule applying to the gradient. First, suppose that the function {{math|''g''}} is a [[parametric curve]]; that is, a function {{math|''g'' : ''I'' → '''R'''<sup>''n''</sup>}} maps a subset {{math|''I'' ⊂ '''R'''}} into {{math|'''R'''<sup>''n''</sup>}}. If {{math|''g''}} is differentiable at a point {{math|''c'' ∈ ''I''}} such that {{math|''g''(''c'') {{=}} ''a''}}, then <math display="block">(f\circ g)'(c) = \nabla f(a)\cdot g'(c),</math> where ∘ is the [[composition operator]]: {{math|1=(''f'' ∘ ''g'')(''x'') = ''f''(''g''(''x''))}}. More generally, if instead {{math|''I'' ⊂ '''R'''<sup>''k''</sup>}}, then the following holds: <math display="block">\nabla (f\circ g)(c) = \big(Dg(c)\big)^\mathsf{T} \big(\nabla f(a)\big),</math> where {{math|(''Dg'')}}<sup>T</sup> denotes the transpose [[Jacobian matrix]]. For the second form of the chain rule, suppose that {{math|''h'' : ''I'' → '''R'''}} is a real valued function on a subset {{math|''I''}} of {{math|'''R'''}}, and that {{math|''h''}} is differentiable at the point {{math|''f''(''a'') ∈ ''I''}}. Then <math display="block">\nabla (h\circ f)(a) = h'\big(f(a)\big)\nabla f(a).</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)