Editing Differential (mathematics) (section)

=== Differentials as linear maps ===
There is a simple way to make precise sense of differentials, first used on the Real line by regarding them as [[linear map]]s. It can be used on <math>\mathbb{R}</math>, <math>\mathbb{R}^n</math>, a [[Hilbert space]], a [[Banach space]], or more generally, a [[topological vector space]]. The case of the Real line is the easiest to explain. This type of differential is also known as a [[covariant vector]] or [[cotangent vector]], depending on context.

==== Differentials as linear maps on R ====

Suppose <math>f(x)</math> is a real-valued function on <math>\mathbb{R}</math>.  We can reinterpret the variable <math>x</math> in <math>f(x)</math> as being a function rather than a number, namely the [[identity map]] on the real line, which takes a real number <math>p</math> to itself: <math>x(p)=p</math>.  Then <math>f(x)</math> is the composite of <math>f</math> with <math>x</math>, whose value at <math>p</math> is <math>f(x(p))=f(p)</math>. The differential <math>\operatorname{d}f</math> (which of course depends on <math>f</math>) is then a function whose value at <math>p</math> (usually denoted <math>df_p</math>) is not a number, but a linear map from <math>\mathbb{R}</math> to <math>\mathbb{R}</math>. Since a linear map from <math>\mathbb{R}</math> to <math>\mathbb{R}</math> is given by a <math>1\times 1</math> [[Matrix (mathematics)|matrix]], it is essentially the same thing as a number, but the change in the point of view allows us to think of <math>df_p</math> as an infinitesimal and ''compare'' it with the ''standard infinitesimal'' <math>dx_p</math>, which is again just the identity map from <math>\mathbb{R}</math> to <math>\mathbb{R}</math> (a <math>1\times 1</math> [[Matrix (mathematics)|matrix]] with entry <math>1</math>). The identity map has the property that if <math>\varepsilon</math> is very small, then <math>dx_p(\varepsilon)</math> is very small, which enables us to regard it as infinitesimal. The differential <math>df_p</math> has the same property, because it is just a multiple of <math>dx_p</math>, and this multiple is the derivative <math>f'(p)</math> by definition. We therefore obtain that <math>df_p=f'(p)\,dx_p</math>, and hence <math>df=f'\,dx</math>. Thus we recover the idea that <math>f'</math> is the ratio of the differentials <math>df</math> and <math>dx</math>.

This would just be a trick were it not for the fact that:
# it captures the idea of the derivative of <math>f</math> at <math>p</math> as the ''best linear approximation'' to <math>f</math> at <math>p</math>;
# it has many generalizations.

==== Differentials as linear maps on R<sup>n</sup> ====

If <math>f</math> is a function from <math>\mathbb{R}^n</math> to <math>\mathbb{R}</math>, then we say that <math>f</math> is ''differentiable''<ref>See, for instance, {{Harvnb|Apostol|1967}}.</ref> at <math>p\in\mathbb{R}^n</math> if there is a linear map <math>df_p</math> from <math>\mathbb{R}^n</math> to <math>\mathbb{R}</math> such that for any <math>\varepsilon>0</math>, there is a [[neighbourhood (mathematics)|neighbourhood]] <math>N</math> of <math>p</math> such that for <math>x\in N</math>,
<math display=block>\left|f(x) - f(p) - df_p(x-p)\right| < \varepsilon \left|x-p\right| .</math>

We can now use the same trick as in the one-dimensional case and think of the expression <math>f(x_1, x_2, \ldots, x_n)</math> as the composite of <math>f</math> with the standard coordinates <math>x_1, x_2, \ldots, x_n</math> on <math>\mathbb{R}^n</math> (so that <math>x_j(p)</math> is the <math>j</math>-th component of <math>p\in\mathbb{R}^n</math>). Then the differentials <math>\left(dx_1\right)_p, \left(dx_2\right)_p, \ldots, \left(dx_n\right)_p</math> at a point <math>p</math> form a [[basis (linear algebra)|basis]] for the [[vector space]] of linear maps from <math>\mathbb{R}^n</math> to <math>\mathbb{R}</math> and therefore, if <math>f</math> is differentiable at <math>p</math>, we can write ''<math>\operatorname{d}f_p</math>'' as a [[linear combination]] of these basis elements:
<math display=block>df_p = \sum_{j=1}^n D_j f(p) \,(dx_j)_p.</math>

The coefficients <math>D_j f(p)</math> are (by definition) the [[partial derivative]]s of <math>f</math> at <math>p</math> with respect to <math>x_1, x_2, \ldots, x_n</math>. Hence, if <math>f</math> is differentiable on all of <math>\mathbb{R}^n</math>, we can write, more concisely:
<math display=block>\operatorname{d}f = \frac{\partial f}{\partial x_1} \,dx_1 + \frac{\partial f}{\partial x_2} \,dx_2 + \cdots +\frac{\partial f}{\partial x_n} \,dx_n.</math>

In the one-dimensional case this becomes
<math display=block>df = \frac{df}{dx}dx</math>
as before.

This idea generalizes straightforwardly to functions from <math>\mathbb{R}^n</math> to <math>\mathbb{R}^m</math>. Furthermore, it has the decisive advantage over other definitions of the derivative that it is [[invariant (mathematics)|invariant]] under changes of coordinates. This means that the same idea can be used to define the [[pushforward (differential)|differential]] of [[smooth map]]s between [[smooth manifold]]s.

Aside: Note that the existence of all the [[partial derivative]]s of <math>f(x)</math> at <math>x</math> is a [[necessary condition]] for the existence of a differential at <math>x</math>. However it is not a [[sufficient condition]]. For counterexamples, see [[Gateaux derivative]].

==== Differentials as linear maps on a vector space ====

The same procedure works on a vector space with a enough additional structure to reasonably talk about continuity. The most concrete case is a Hilbert space, also known as a [[Complete metric space|complete]] [[inner product space]], where the inner product and its associated [[Norm (mathematics)|norm]] define a suitable concept of distance. The same procedure works for a Banach space, also known as a complete [[Normed vector space]]. However, for a more general topological vector space, some of the details are more abstract because there is no concept of distance.

For the important case of a finite dimension, any inner product space is a Hilbert space, any normed vector space is a Banach space and any topological vector space is complete. As a result, you can define a coordinate system from an arbitrary basis and use the same technique as for <math>\mathbb{R}^n</math>.