Editing Method of characteristics (section)

==Characteristics of first-order partial differential equation==
For a first-order PDE, the method of characteristics discovers so called '''characteristic curves''' along which the PDE becomes an ODE.{{sfn|Zachmanoglou|Thoe|1986|pp=112–152}}{{sfn|Pinchover|Rubinstein|2005|pp=25-28}} Once the ODE is found, it can be solved along the characteristic curves and transformed into a solution for the original PDE.

For the sake of simplicity, we confine our attention to the case of a function of two independent variables ''x'' and ''y'' for the moment.  Consider a [[partial differential equation#Linear and nonlinear equations|quasilinear PDE]] of the form{{sfn|John|1991|p=9}}

{{NumBlk|:|<math>a(x,y,z) \frac{\partial z}{\partial x}+b(x,y,z) \frac{\partial z}{\partial y}=c(x,y,z).</math>|{{EquationRef|1}}}}

Suppose that a solution ''z'' is known, and consider the surface graph ''z''&nbsp;=&nbsp;''z''(''x'',''y'') in '''R'''<sup>3</sup>.  A [[normal vector]] to this surface is given by{{sfn|Zauderer|2006|p=82}}

:<math>\left(\frac{\partial z}{\partial x}(x,y),\frac{\partial z}{\partial y}(x,y),-1\right).\,</math>

As a result, equation ({{EquationNote|1}}) is equivalent to the geometrical statement that the vector field

:<math>(a(x,y,z),b(x,y,z),c(x,y,z))\,</math>

is tangent to the surface ''z''&nbsp;=&nbsp;''z''(''x'',''y'') at every point, for the [[dot product]] of this vector field with the above normal vector is zero.  In other words, the graph of the solution must be a union of [[integral curve]]s of this vector field.  These integral curves are called the characteristic curves of the original partial differential equation and follow as the solutions of the characteristic equations:{{sfn|John|1991|p=9}}

:<math>
\begin{align}
\frac{dx}{dt}&=a(x,y,z),\\[8pt]
\frac{dy}{dt}&=b(x,y,z),\\[8pt]
\frac{dz}{dt}&=c(x,y,z).
\end{align}
</math>

A parametrization invariant form of the ''Lagrange–Charpit equations'' is:{{sfn|Demidov|1982|pp=331–333}}

:<math>\frac{dx}{a(x,y,z)} = \frac{dy}{b(x,y,z)} = \frac{dz}{c(x,y,z)} .</math>

===Linear and quasilinear cases===
Consider now a PDE of the form

:<math>\sum_{i=1}^n a_i(x_1,\dots,x_n,u) \frac{\partial u}{\partial x_i}=c(x_1,\dots,x_n,u).</math>

For this PDE to be [[linear]], the coefficients ''a''<sub>''i''</sub> may be functions of the spatial variables only, and independent of ''u''.  For it to be quasilinear,<ref name="quasilinear">{{cite web| url = https://reference.wolfram.com/language/tutorial/DSolveLinearAndQuasiLinearFirstOrderPDEs.html |title = Partial Differential Equations (PDEs)—Wolfram Language Documentation}}</ref> ''a''<sub>''i''</sub> may also depend on the value of the function, but not on any derivatives.  The distinction between these two cases is inessential for the discussion here.

For a linear or quasilinear PDE, the characteristic curves are given parametrically by

:<math>(x_1,\dots,x_n,u) = (X_1(s),\dots,X_n(s),U(s))</math>
:<math>u(\mathbf{X}(s)) = U(s)</math>
for some univariate functions <math>s\mapsto (X_i(s))_i,U(s)</math>
of one real variable <math>s</math>
satisfying the following system of ordinary differential equations

{{NumBlk|:|<math>X_i' = a_i(X_1,\dots,X_n,U)
\text{ for }i=1,\dotsc,n</math>|{{EquationRef|2}}}}
{{NumBlk|:|<math>U' = c(X_1,\dots,X_n,U).</math>|{{EquationRef|3}}}}

Equations ({{EquationNote|2}}) and ({{EquationNote|3}}) give the characteristics of the PDE.

{{collapse top|title=Proof for quasilinear case}} 
In the quasilinear case, the use of the method of characteristics is justified by [[Grönwall's inequality]]. The above equation may be written as
<math display="block">\mathbf{a}(\mathbf{x},u) \cdot \nabla u(\mathbf{x}) = c(\mathbf{x},u) </math>

We must distinguish between the solutions to the ODE and the solutions to the PDE, which we do not know are equal ''a priori.'' Letting capital letters be the solutions to the ODE we find
<math display="block">\mathbf{X}'(s) = \mathbf{a}(\mathbf{X}(s),U(s)) </math>
<math display="block">U'(s) = c(\mathbf{X}(s), U(s)) </math>

Examining <math>\Delta(s) = |u(\mathbf{X}(s)) - U(s)|^2 </math>, we find, upon differentiating that
<math display="block">\Delta'(s) = 2\big(u(\mathbf{X}(s)) - U(s)\big) \Big(\mathbf{X}'(s)\cdot \nabla u(\mathbf{X}(s)) - U'(s)\Big) </math>
which is the same as
<math display="block">\Delta'(s) = 2\big(u(\mathbf{X}(s)) - U(s)\big) \Big(\mathbf{a}(\mathbf{X}(s),U(s))\cdot \nabla u(\mathbf{X}(s)) - c(\mathbf{X}(s),U(s))\Big) </math>

We cannot conclude the above is 0 as we would like, since the PDE only guarantees us that this relationship is satisfied for
<math>u(\mathbf{x})</math>, <math>\mathbf{a}(\mathbf{x},u) \cdot \nabla u(\mathbf{x}) = c(\mathbf{x},u)</math>,
and we do not yet know that <math>U(s) = u(\mathbf{X}(s))</math>.

However, we can see that
<math display="block">\Delta'(s) = 2\big(u(\mathbf{X}(s)) - U(s)\big) \Big(\mathbf{a}(\mathbf{X}(s),U(s))\cdot \nabla u(\mathbf{X}(s)) - c(\mathbf{X}(s),U(s))-\big(\mathbf{a}(\mathbf{X}(s),u(\mathbf{X}(s))) \cdot \nabla u(\mathbf{X}(s)) - c(\mathbf{X}(s),u(\mathbf{X}(s)))\big)\Big) </math>
since by the PDE, the last term is 0. This equals
<math display="block">\Delta'(s) = 2\big(u(\mathbf{X}(s)) - U(s)\big) \Big(\big(\mathbf{a}(\mathbf{X}(s),U(s))-\mathbf{a}(\mathbf{X}(s),u(\mathbf{X}(s)))\big)\cdot \nabla u(\mathbf{X}(s)) - \big(c(\mathbf{X}(s),U(s))-c(\mathbf{X}(s),u(\mathbf{X}(s)))\big)\Big) </math>

By the triangle inequality, we have
<math display="block">|\Delta'(s)| \leq 2\big|u(\mathbf{X}(s)) - U(s)\big| \Big(\big\|\mathbf{a}(\mathbf{X}(s),U(s))-\mathbf{a}(\mathbf{X}(s),u(\mathbf{X}(s)))\big\| \ \|\nabla u(\mathbf{X}(s))\| + \big|c(\mathbf{X}(s),U(s))-c(\mathbf{X}(s),u(\mathbf{X}(s)))\big|\Big) </math>

Assuming <math>\mathbf{a},c </math> are at least <math>C^1 </math>, we can bound this for small times. Choose a neighborhood <math>\Omega </math> around <math>\mathbf{X}(0), U(0) </math> small enough such that <math>\mathbf{a},c </math> are [[locally Lipschitz]]. By continuity, <math>(\mathbf{X}(s),U(s)) </math> will remain in <math>\Omega </math> for small enough <math>s
 </math>. Since <math>U(0) = u(\mathbf{X}(0)) </math>, we also have that <math>(\mathbf{X}(s), u(\mathbf{X}(s))) </math> will be in <math>\Omega </math> for small enough <math>s </math> by continuity. So, <math>(\mathbf{X}(s),U(s)) \in \Omega </math> and <math>(\mathbf{X}(s), u(\mathbf{X}(s))) \in \Omega </math> for <math>s \in [0,s_0] </math>. Additionally, <math>\|\nabla u(\mathbf{X}(s))\| \leq M </math> for some <math>M \in \R </math> for <math>s \in [0,s_0] </math> by compactness. From this, we find the above is bounded as
<math display="block">|\Delta'(s)| \leq C|u(\mathbf{X}(s)) - U(s)|^2 = C |\Delta(s)| </math>
for some <math>C \in \mathbb{R} </math>. It is a straightforward application of Grönwall's Inequality to show that since <math>\Delta(0) = 0 </math> we have <math>\Delta(s) = 0 </math> for as long as this inequality holds. We have some interval <math>[0, \varepsilon) </math> such that <math>u(X(s)) = U(s) </math> in this interval. Choose the largest <math>\varepsilon </math> such that this is true. Then, by continuity, <math>U(\varepsilon) = u(\mathbf{X}(\varepsilon)) </math>. Provided the ODE still has a solution in some interval after <math>\varepsilon </math>, we can repeat the argument above to find that <math>u(X(s)) = U(s) </math> in a larger interval. Thus, so long as the ODE has a solution, we have <math>u(X(s)) = U(s) </math>.
{{collapse bottom}}

===Fully nonlinear case===
Consider the partial differential equation

{{NumBlk|:|<math>F(x_1,\dots,x_n,u,p_1,\dots,p_n)=0</math>|{{EquationRef|4}}}}

where the variables ''p''<sub>i</sub> are shorthand for the partial derivatives

:<math>p_i = \frac{\partial u}{\partial x_i}.</math>

Let (''x''<sub>i</sub>(''s''),''u''(''s''),''p''<sub>i</sub>(''s'')) be a curve in '''R'''<sup>2n+1</sup>.  Suppose that ''u'' is any solution, and that

:<math>u(s) = u(x_1(s),\dots,x_n(s)).</math>

Along a solution, differentiating ({{EquationNote|4}}) with respect to ''s'' gives{{sfn|John|1991|pp=19-24}}

:<math>\sum_i(F_{x_i} + F_u p_i)\dot{x}_i + \sum_i F_{p_i}\dot{p}_i = 0</math>

:<math>\dot{u} - \sum_i p_i \dot{x}_i = 0</math>

:<math>\sum_i (\dot{x}_i dp_i - \dot{p}_i dx_i)= 0.</math>

The second equation follows from applying the [[chain rule]] to a solution ''u'', and the third follows by taking an [[exterior derivative]] of the relation <math>du - \sum_i p_i \, dx_i = 0</math>.  Manipulating these equations gives

:<math>\dot{x}_i=\lambda F_{p_i},\quad\dot{p}_i=-\lambda(F_{x_i}+F_up_i),\quad \dot{u}=\lambda\sum_i p_iF_{p_i}</math>

where λ is a constant.  Writing these equations more symmetrically, one obtains the Lagrange–Charpit equations for the characteristic

:<math>\frac{\dot{x}_i}{F_{p_i}}=-\frac{\dot{p}_i}{F_{x_i}+F_up_i}=\frac{\dot{u}}{\sum p_iF_{p_i}}.</math>

Geometrically, the method of characteristics in the fully nonlinear case can be interpreted as requiring that the [[Monge cone]] of the differential equation should everywhere be tangent to the graph of the solution.