Editing Gradient (section)

==Relationship with derivative{{anchor|Derivative}}==
{{Calculus|Vector}}

===Relationship with total derivative{{anchor|Total derivative}}===
The gradient is closely related to the [[total derivative]] ([[total differential]]) <math>df</math>: they are [[transpose]] ([[Transpose of a linear map|dual]]) to each other. Using the convention that vectors in <math>\R^n</math> are represented by [[column vector]]s, and that covectors (linear maps <math>\R^n \to \R</math>) are represented by [[row vector]]s,{{efn|name=row-column}} the gradient <math>\nabla f</math> and the derivative <math>df</math> are expressed as a column and row vector, respectively, with the same components, but transpose of each other:

<math display="block">\nabla f(p) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) \\ \vdots \\ \frac{\partial f}{\partial x_n}(p) \end{bmatrix} ;</math>
<math display="block">df_p = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) & \cdots & \frac{\partial f}{\partial x_n}(p) \end{bmatrix} .</math>

While these both have the same components, they differ in what kind of mathematical object they represent: at each point, the derivative is a [[cotangent vector]], a [[linear form]] (or covector) which expresses how much the (scalar) output changes for a given infinitesimal change in (vector) input, while at each point, the gradient is a [[tangent vector]], which represents an infinitesimal change in (vector) input. In symbols, the gradient is an element of the tangent space at a point, <math>\nabla f(p) \in T_p \R^n</math>, while the derivative is a map from the tangent space to the real numbers, <math>df_p \colon T_p \R^n \to \R</math>. The tangent spaces at each point of <math>\R^n</math> can be "naturally" identified{{efn|Informally, "naturally" identified means that this can be done without making any arbitrary choices. This can be formalized with a [[natural transformation]].}} with the vector space <math>\R^n</math> itself, and similarly the cotangent space at each point can be naturally identified with the [[dual vector space]] <math>(\R^n)^*</math> of covectors; thus the value of the gradient at a point can be thought of a vector in the original <math>\R^n</math>, not just as a tangent vector.

Computationally, given a tangent vector, the vector can be ''multiplied'' by the derivative (as matrices), which is equal to taking the [[dot product]] with the gradient:
<math display="block">
(df_p)(v) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) & \cdots & \frac{\partial f}{\partial x_n}(p) \end{bmatrix}
\begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix}
= \sum_{i=1}^n \frac{\partial f}{\partial x_i}(p) v_i
= \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) \\ \vdots \\ \frac{\partial f}{\partial x_n}(p) \end{bmatrix} \cdot \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix}
= \nabla f(p) \cdot v</math>

====Differential or (exterior) derivative====
The best linear approximation to a differentiable function
<math display="block">f : \R^n \to \R</math>
at a point <math>x</math> in <math>\R^n</math> is a linear map from <math>\R^n</math> to <math>\R</math> which is often denoted by <math>df_x</math> or <math>Df(x)</math> and called the [[differential (calculus)|differential]] or [[total derivative]] of <math>f</math> at <math>x</math>. The function <math>df</math>, which maps <math>x</math> to <math>df_x</math>, is called the [[total differential]] or [[exterior derivative]] of <math>f</math> and is an example of a [[differential 1-form]].

Much as the derivative of a function of a single variable represents the [[slope]] of the [[tangent]] to the [[graph of a function|graph]] of the function,<ref>{{harvtxt|Protter|Morrey|1970|pp=21,88}}</ref> the directional derivative of a function in several variables represents the slope of the tangent [[hyperplane]] in the direction of the vector.

The gradient is related to the differential by the formula
<math display="block">(\nabla f)_x\cdot v = df_x(v)</math>
for any <math>v\in\R^n</math>, where <math>\cdot</math> is the [[dot product]]: taking the dot product of a vector with the gradient is the same as taking the directional derivative along the vector.

If <math>\R^n</math> is viewed as the space of (dimension <math>n</math>) column vectors (of real numbers), then one can regard <math>df</math> as the row vector with components
<math display="block">\left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n}\right),</math>
so that <math>df_x(v)</math> is given by [[matrix multiplication]]. Assuming the standard Euclidean metric on <math>\R^n</math>, the gradient is then the corresponding column vector, that is,
<math display="block">(\nabla f)_i = df^\mathsf{T}_i.</math>

====Linear approximation to a function====
The best [[linear approximation]] to a function can be expressed in terms of the gradient, rather than the derivative. The gradient of a [[function (mathematics)|function]] <math>f</math> from the Euclidean space <math>\R^n</math> to <math>\R</math> at any particular point <math>x_0</math> in <math>\R^n</math> characterizes the best [[linear approximation]] to <math>f</math> at <math>x_0</math>. The approximation is as follows:

<math display="block">f(x) \approx f(x_0) + (\nabla f)_{x_0}\cdot(x-x_0)</math>

for <math>x</math> close to <math>x_0</math>, where <math>(\nabla f)_{x_0}</math> is the gradient of <math>f</math> computed at <math>x_0</math>, and the dot denotes the dot product on <math>\R^n</math>. This equation is equivalent to the first two terms in the [[Taylor series#Taylor series in several variables|multivariable Taylor series]] expansion of <math>f</math> at <math>x_0</math>.

===Relationship with {{vanchor|Fréchet derivative}}===
Let {{math|''U''}} be an [[open set]] in {{math|'''R'''<sup>''n''</sup>}}. If the function {{math|''f'' : ''U'' → '''R'''}} is differentiable, then the differential of {{math|''f''}} is the [[Fréchet derivative]] of {{math|''f''}}. Thus {{math|∇''f''}} is a function from {{math|''U''}} to the space {{math|'''R'''<sup>''n''</sup>}} such that
<math display="block">\lim_{h\to 0} \frac{|f(x+h)-f(x) -\nabla f(x)\cdot h|}{\|h\|} = 0,</math>
where · is the dot product.

As a consequence, the usual properties of the derivative hold for the gradient, though the gradient is not a derivative itself, but rather dual to the derivative:

;[[Linearity]]
:The gradient is linear in the sense that if {{math|''f''}} and {{math|''g''}} are two real-valued functions differentiable at the point {{math|''a'' ∈ '''R'''<sup>''n''</sup>}}, and {{mvar|α}} and {{mvar|β}} are two constants, then {{math|''αf'' + ''βg''}} is differentiable at {{math|''a''}}, and moreover <math display="block">\nabla\left(\alpha f+\beta g\right)(a) = \alpha \nabla f(a) + \beta\nabla g (a).</math>
;[[Product rule]]
:If {{math|''f''}} and {{math|''g''}} are real-valued functions differentiable at a point {{math|''a'' ∈ '''R'''<sup>''n''</sup>}}, then the product rule asserts that the product {{math|''fg''}} is differentiable at {{math|''a''}}, and <math display="block">\nabla (fg)(a) = f(a)\nabla g(a) + g(a)\nabla f(a).</math>
;[[Chain rule]]
:Suppose that {{math|''f'' : ''A'' → '''R'''}} is a real-valued function defined on a subset {{math|''A''}} of {{math|'''R'''<sup>''n''</sup>}}, and that {{math|''f''}} is differentiable at a point {{math|''a''}}. There are two forms of the chain rule applying to the gradient. First, suppose that the function {{math|''g''}} is a [[parametric curve]]; that is, a function {{math|''g'' : ''I'' → '''R'''<sup>''n''</sup>}} maps a subset {{math|''I'' ⊂ '''R'''}} into {{math|'''R'''<sup>''n''</sup>}}. If {{math|''g''}} is differentiable at a point {{math|''c'' ∈ ''I''}} such that {{math|''g''(''c'') {{=}} ''a''}}, then <math display="block">(f\circ g)'(c) = \nabla f(a)\cdot g'(c),</math> where ∘ is the [[composition operator]]: {{math|1=(''f'' ∘ ''g'')(''x'') = ''f''(''g''(''x''))}}.

More generally, if instead {{math|''I'' ⊂ '''R'''<sup>''k''</sup>}}, then the following holds:
<math display="block">\nabla (f\circ g)(c) = \big(Dg(c)\big)^\mathsf{T} \big(\nabla f(a)\big),</math>
where {{math|(''Dg'')}}<sup>T</sup> denotes the transpose [[Jacobian matrix]].

For the second form of the chain rule, suppose that {{math|''h'' : ''I'' → '''R'''}} is a real valued function on a subset {{math|''I''}} of {{math|'''R'''}}, and that {{math|''h''}} is differentiable at the point {{math|''f''(''a'') ∈ ''I''}}. Then
<math display="block">\nabla (h\circ f)(a) = h'\big(f(a)\big)\nabla f(a).</math>