Editing Gradient (section)

{{Short description|Multivariate derivative (mathematics)}}
{{about|a generalized derivative of a multivariate function|another use in mathematics|Slope|a similarly spelled unit of angle|Gradian|other uses}}
{{more citations needed|date=January 2018}}
[[File:Gradient2.svg|thumb|300px|The gradient, represented by the blue arrows, denotes the direction of greatest change of a scalar function. The values of the function are represented in greyscale and increase in value from white (low) to dark (high).]]

In [[vector calculus]], the '''gradient''' of a [[scalar-valued function|scalar-valued]] [[differentiable function]] <math>f</math> of [[Multivalued function|several variables]] is the [[vector field]] (or [[vector-valued function]]) <math>\nabla f</math> whose value at a point <math>p</math> gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of <math>f</math>. If the gradient of a function is non-zero at a point <math>p</math>, the direction of the gradient is the direction in which the function increases most quickly from <math>p</math>, and the [[magnitude (mathematics)|magnitude]] of the gradient is the rate of increase in that direction, the greatest [[absolute value|absolute]] directional derivative.<ref>
*{{harvtxt|Bachman|2007|p=77}}
*{{harvtxt|Downing|2010|pp=316–317}}
*{{harvtxt|Kreyszig|1972|p=309}}
*{{harvtxt|McGraw-Hill|2007|p=196}}
*{{harvtxt|Moise|1967|p=684}}
*{{harvtxt|Protter|Morrey|1970|p=715}}
*{{harvtxt|Swokowski et al.|1994|pp=1036,1038–1039}}</ref> Further, a point where the gradient is the zero vector is known as a [[stationary point]]. The gradient thus plays a fundamental role in [[optimization theory]], where it is used to minimize a function by [[gradient descent]]. In coordinate-free terms, the gradient of a function <math>f(\mathbf{r})</math> may be defined by:

<math display="block">df=\nabla f \cdot d\mathbf{r}</math>

where <math>df</math> is the total infinitesimal change in <math>f</math> for an infinitesimal displacement  <math>d\mathbf{r}</math>, and is seen to be maximal when <math>d\mathbf{r}</math> is in the direction of the gradient <math>\nabla f</math>. The [[nabla symbol]] <math>\nabla</math>, written as an upside-down triangle and pronounced "del", denotes the [[Del|vector differential operator]].

When a coordinate system is used in which the basis vectors are not functions of position, the gradient is given by the [[Vector (mathematics and physics)|vector]]{{efn|name=row-column|This article uses the convention that [[column vector]]s represent vectors, and [[row vector]]s represent covectors, but the opposite convention is also common.}} whose components are the [[partial derivative]]s of <math>f</math> at <math>p</math>.<ref>
*{{harvtxt|Bachman|2007|p=76}}
*{{harvtxt|Beauregard|Fraleigh|1973|p=84}}
*{{harvtxt|Downing|2010|p=316}}
*{{harvtxt|Harper|1976|p=15}}
*{{harvtxt|Kreyszig|1972|p=307}}
*{{harvtxt|McGraw-Hill|2007|p=196}}
*{{harvtxt|Moise|1967|p=683}}
*{{harvtxt|Protter|Morrey|1970|p=714}}
*{{harvtxt|Swokowski et al.|1994|p=1038}}</ref> That is, for <math>f \colon \R^n \to \R</math>, its gradient <math>\nabla f \colon \R^n \to \R^n</math> is defined at the point <math>p = (x_1,\ldots,x_n)</math> in ''n''-dimensional space as the vector{{efn|Strictly speaking, the gradient is a [[vector field]] <math>f \colon \R^n \to T\R^n</math>, and the value of the gradient at a point is a [[tangent vector]] in the [[tangent space]] at that point, <math>T_p \R^n</math>, not a vector in the original space <math>\R^n</math>. However, all the tangent spaces can be naturally identified with the original space <math>\R^n</math>, so these do not need to be distinguished; see {{slink||Definition}} and [[#Derivative|relationship with the derivative]].}}

<math display="block">\nabla f(p) = \begin{bmatrix}
 \frac{\partial f}{\partial x_1}(p) \\
 \vdots \\
 \frac{\partial f}{\partial x_n}(p)
\end{bmatrix}.</math>

Note that the above definition for gradient is defined for the function <math>f</math> only if <math>f</math> is differentiable at <math>p</math>. There can be functions for which partial derivatives exist in every direction but fail to be differentiable. Furthermore, this definition as the vector of partial derivatives is only valid when the basis of the coordinate system is [[Orthonormal basis|orthonormal]]. For any other basis, the [[metric tensor]] at that point needs to be taken into account.

For example, the function <math>f(x,y)=\frac {x^2 y}{x^2+y^2}</math> unless at origin where <math>f(0,0)=0</math>, is not differentiable at the origin as it does not have a well defined tangent plane despite having well defined partial derivatives in every direction at the origin.<ref>{{Cite web |title=Non-differentiable functions must have discontinuous partial derivatives - Math Insight |url=https://mathinsight.org/nondifferentiable_discontinuous_partial_derivatives |access-date=2023-10-21 |website=mathinsight.org}}</ref> In this particular example, under rotation of x-y coordinate system, the above formula for gradient fails to transform like a vector (gradient becomes dependent on choice of basis for coordinate system) and also fails to point towards the 'steepest ascent' in some orientations. For differentiable functions where the formula for gradient holds, it can be shown to always transform as a vector under transformation of the basis so as to always point towards the fastest increase.

The gradient is dual to the [[total derivative]] <math>df</math>: the value of the gradient at a point is a [[tangent vector]] – a vector at each point; while the value of the derivative at a point is a [[cotangent vector|''co''tangent vector]] – a linear functional on vectors.{{efn|The value of the gradient at a point can be thought of as a vector in the original space <math>\R^n</math>, while the value of the derivative at a point can be thought of as a covector on the original space: a linear map <math>\R^n \to \R</math>.}} They are related in that the [[dot product]] of the gradient of <math>f</math> at a point <math>p</math> with another tangent vector <math>\mathbf{v}</math> equals the [[directional derivative]] of <math>f</math> at <math>p</math> of the function along <math>\mathbf{v}</math>; that is, <math display="inline">\nabla f(p) \cdot \mathbf v = \frac{\partial f}{\partial\mathbf{v}}(p) = df_{p}(\mathbf{v}) </math>. 
The gradient admits multiple generalizations to more general functions on [[manifold]]s; see {{slink||Generalizations}}.