Editing Total derivative (section)

==The chain rule for total derivatives==
{{main|Chain rule}}

The chain rule has a particularly elegant statement in terms of total derivatives.  It says that, for two functions <math>f</math> and <math>g</math>, the total derivative of the [[composite function]] <math>f \circ g</math> at <math>a</math> satisfies
:<math>d(f \circ g)_a = df_{g(a)} \cdot dg_a.</math>
If the total derivatives of <math>f</math> and <math>g</math> are identified with their Jacobian matrices, then the composite on the right-hand side is simply matrix multiplication.  This is enormously useful in applications, as it makes it possible to account for essentially arbitrary dependencies among the arguments of a composite function.

===Example: Differentiation with direct dependencies===
Suppose that ''f'' is a function of two variables, ''x'' and ''y''.  If these two variables are independent, so that the domain of ''f'' is <math>\R^2</math>, then the behavior of ''f'' may be understood in terms of its partial derivatives in the ''x'' and ''y'' directions.  However, in some situations, ''x'' and ''y'' may be dependent.  For example, it might happen that ''f'' is constrained to a curve <math>y = y(x)</math>.  In this case, we are actually interested in the behavior of the composite function <math>f(x, y(x))</math>.  The partial derivative of ''f'' with respect to ''x'' does not give the true rate of change of ''f'' with respect to changing ''x'' because changing ''x'' necessarily changes ''y''.  However, the chain rule for the total derivative takes such dependencies into account.  Write <math>\gamma(x) = (x, y(x))</math>.  Then, the chain rule says
:<math>d(f \circ \gamma)_{x_0} = df_{(x_0, y(x_0))} \cdot d\gamma_{x_0}.</math>
By expressing the total derivative using Jacobian matrices, this becomes:
:<math>\frac{df(x, y(x))}{dx}(x_0) = \frac{\partial f}{\partial x}(x_0, y(x_0)) \cdot \frac{dx}{dx}(x_0) + \frac{\partial f}{\partial y}(x_0, y(x_0)) \cdot \frac{dy}{dx}(x_0).</math>
Suppressing the evaluation at <math>x_0</math> for legibility, we may also write this as
:<math>\frac{df(x, y(x))}{dx} = \frac{\partial f}{\partial x} \frac{dx}{dx} + \frac{\partial f}{\partial y} \frac{dy}{dx}.</math>
This gives a straightforward formula for the derivative of <math>f(x, y(x))</math> in terms of the partial derivatives of <math>f</math> and the derivative of <math>y(x)</math>.

For example, suppose
:<math>f(x,y)=xy.</math>
The rate of change of ''f'' with respect to ''x'' is usually the partial derivative of ''f'' with respect to ''x''; in this case,
:<math>\frac{\partial f}{\partial x} = y.</math>
However, if ''y'' depends on ''x'', the partial derivative does not give the true rate of change of ''f'' as ''x'' changes because the partial derivative assumes that ''y'' is fixed.  Suppose we are constrained to the line
:<math>y=x.</math>
Then
:<math>f(x,y) = f(x,x) = x^2,</math>
and the total derivative of ''f'' with respect to ''x'' is
:<math>\frac{df}{dx} = 2 x,</math>
which we see is not equal to the partial derivative <math>\partial f/\partial x</math>.  Instead of immediately substituting for ''y'' in terms of ''x'', however, we can also use the chain rule as above:
:<math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y}\frac{dy}{dx} = y+x \cdot 1 = x+y = 2x.</math>

===Example: Differentiation with indirect dependencies===
While one can often perform substitutions to eliminate indirect dependencies, the [[chain rule]] provides for a more efficient and general technique.  Suppose <math>L(t,x_1,\dots,x_n)</math> is a function of time <math>t</math> and <math>n</math> variables <math>x_i</math> which themselves depend on time. Then, the time derivative of <math>L</math> is
:<math>\frac{dL}{dt} = \frac{d}{dt} L \bigl(t, x_1(t), \ldots, x_n(t)\bigr).</math>

The chain rule expresses this derivative in terms of the partial derivatives of <math>L</math> and the time derivatives of the functions <math>x_i</math>:
:<math>\frac{dL}{dt}
= \frac{\partial L}{\partial t} + \sum_{i=1}^n \frac{\partial L}{\partial x_i}\frac{dx_i}{dt}
= \biggl(\frac{\partial}{\partial t} + \sum_{i=1}^n \frac{dx_i}{dt}\frac{\partial}{\partial x_i}\biggr)(L).</math>

This expression is often used in [[physics]] for a [[gauge transformation]] of the [[Lagrangian mechanics|Lagrangian]], as two Lagrangians that differ only by the total time derivative of a function of time and the <math>n</math> [[generalized coordinates]] lead to the same equations of motion. An interesting example concerns the resolution of causality concerning the [[Wheeler–Feynman absorber theory#Resolution of causality issue|Wheeler–Feynman time-symmetric theory]].  The operator in brackets (in the final expression above) is also called the total derivative operator (with respect to <math>t</math>).

For example, the total derivative of <math>f(x(t),y(t))</math> is

:<math>\frac{df}{dt} = { \partial f \over \partial x}{dx \over dt} + {\partial f \over \partial y}{dy \over dt }.</math>

Here there is no <math>\partial f / \partial t</math> term since <math>f</math> itself does not depend on the independent variable <math>t</math> directly.