Editing Distribution (mathematics) (section)

==Operations on distributions==

Many operations which are defined on smooth functions with compact support can also be defined for distributions. In general, if <math>A:\mathcal{D}(U)\to\mathcal{D}(U)</math> is a linear map that is continuous with respect to the [[weak topology]], then it is not always possible to extend <math>A</math> to a map <math>A': \mathcal{D}'(U)\to \mathcal{D}'(U)</math> by classic extension theorems of topology or linear functional analysis.<ref group="note">The extension theorem for mappings  defined from a subspace S of a topological vector space E to the topological space E itself works for non-linear mappings as well, provided they are assumed to be [[uniformly continuous]]. But, unfortunately, this is not our case, we would desire to “extend” a linear continuous mapping A from a tvs E into another tvs F, in order to obtain a linear continuous mapping from the dual E’ to the dual F’ (note the order of spaces). In general, this is not even an extension problem, because (in general) E is not necessarily a subset of its own dual E’. Moreover, It is not a classic topological transpose problem, because the transpose of A goes from F’ to E’ and not from E’ to F’. Our case needs, indeed, a new order of ideas, involving the specific topological properties of the Laurent Schwartz spaces D(U) and D’(U), together with the fundamental concept of weak (or Schwartz) adjoint of the linear continuous operator A.</ref> The “distributional” extension of the above linear continuous operator A is possible if and only if A admits a Schwartz adjoint, that is another linear continuous operator B of the same type such that  
<math>
\langle Af,g\rangle = \langle f,Bg\rangle
</math>,
for every pair of test functions. In that condition, B is unique and the extension A’ is the transpose of the Schwartz adjoint B. {{citation needed|date=September 2020}}<ref>{{Cite book |last=Strichartz |first=Robert |title=A Guide to Distribution Theory and Fourier Transforms |year=1993 |location=USA |pages=17 |language=English}}</ref>{{clarify|date=September 2020}}

===Preliminaries: Transpose of a linear operator===
{{anchor|Transpose of a linear operator}}
{{Main|Transpose of a linear map}}

Operations on distributions and spaces of distributions are often defined using the [[Transpose of a linear map|transpose]] of a linear operator. This is because the transpose allows for a unified presentation of the many definitions in the theory of distributions and also because its properties are well-known in [[functional analysis]].<ref>{{harvnb|Strichartz|1994|loc=§2.3}}; {{harvnb|Trèves|2006}}.</ref> For instance, the well-known [[Hermitian adjoint]] of a linear operator between [[Hilbert space]]s is just the operator's transpose (but with the [[Riesz representation theorem]] used to identify each Hilbert space with its [[Strong dual space|continuous dual space]]). In general, the transpose of a continuous linear map <math>A : X \to Y</math> is the linear map 
<math display=block>{}^{t}A : Y' \to X' \qquad \text{ defined by } \qquad {}^{t}A(y') := y' \circ A,</math> 
or equivalently, it is the unique map satisfying <math>\langle y', A(x)\rangle = \left\langle {}^{t}A (y'), x \right\rangle</math> for all <math>x \in X</math> and all <math>y' \in Y'</math> (the prime symbol in <math>y'</math> does not denote a derivative of any kind; it merely indicates that <math>y'</math> is an element of the continuous dual space <math>Y'</math>). Since <math>A</math> is continuous, the transpose <math>{}^{t}A : Y' \to X'</math> is also continuous when both duals are endowed with their respective [[Strong dual space|strong dual topologies]]; it is also continuous when both duals are endowed with their respective [[Weak* topology|weak* topologies]] (see the articles [[Polar topology#Polar topologies and topological vector spaces|polar topology]] and [[Dual system#Weak topology|dual system]] for more details).

In the context of distributions, the characterization of the transpose can be refined slightly. Let <math>A : \mathcal{D}(U) \to \mathcal{D}(U)</math> be a continuous linear map. Then by definition, the transpose of <math>A</math> is the unique linear operator <math>{}^tA : \mathcal{D}'(U) \to \mathcal{D}'(U)</math> that satisfies:
<math display=block>\langle {}^{t}A(T), \phi \rangle = \langle T, A(\phi) \rangle \quad \text{ for all } \phi \in \mathcal{D}(U) \text{ and all } T \in \mathcal{D}'(U).</math>

Since <math>\mathcal{D}(U)</math> is dense in <math>\mathcal{D}'(U)</math> (here, <math>\mathcal{D}(U)</math> actually refers to the set of distributions <math>\left\{D_\psi : \psi \in \mathcal{D}(U)\right\}</math>) it is sufficient that the defining equality hold for all distributions of the form <math>T = D_\psi</math> where <math>\psi \in \mathcal{D}(U).</math> Explicitly, this means that a continuous linear map <math>B : \mathcal{D}'(U) \to \mathcal{D}'(U)</math> is equal to <math>{}^{t}A</math> if and only if the condition below holds:
<math display=block>\langle B(D_\psi), \phi \rangle = \langle {}^{t}A(D_\psi), \phi \rangle \quad \text{ for all } \phi, \psi \in \mathcal{D}(U)</math>
where the right-hand side equals <math>\langle {}^{t}A(D_\psi), \phi \rangle = \langle D_\psi, A(\phi) \rangle = \langle \psi, A(\phi) \rangle = \int_U \psi \cdot A(\phi) \,dx.</math>

===Differential operators===

====Differentiation of distributions====

Let <math>A : \mathcal{D}(U) \to \mathcal{D}(U)</math> be the partial derivative operator <math>\tfrac{\partial}{\partial x_k}.</math> To extend <math>A</math> we compute its transpose:
<math display=block>\begin{align}
\langle {}^{t}A(D_\psi), \phi \rangle
&= \int_U \psi (A\phi) \,dx && \text{(See above.)} \\
&= \int_U \psi \frac{\partial\phi}{\partial x_k} \, dx \\[4pt]
&= -\int_U \phi \frac{\partial\psi}{\partial x_k}\, dx && \text{(integration by parts)} \\[4pt]
&= -\left\langle \frac{\partial\psi}{\partial x_k}, \phi \right\rangle \\[4pt]
&= -\langle A \psi, \phi \rangle = \langle - A \psi, \phi \rangle
\end{align}</math>

Therefore <math>{}^{t}A = -A.</math> Thus, the partial derivative of <math>T</math> with respect to the coordinate <math>x_k</math> is defined by the formula
<math display=block>\left\langle \frac{\partial T}{\partial x_k}, \phi \right\rangle = - \left\langle T, \frac{\partial \phi}{\partial x_k} \right\rangle \qquad \text{ for all } \phi \in \mathcal{D}(U).</math>

With this definition, every distribution is infinitely differentiable, and the derivative in the direction <math>x_k</math> is a [[linear operator]] on <math>\mathcal{D}'(U).</math>

More generally, if <math>\alpha</math> is an arbitrary [[multi-index]], then the partial derivative <math>\partial^\alpha T</math> of the distribution <math>T \in \mathcal{D}'(U)</math> is defined by
<math display=block>\langle \partial^\alpha T, \phi \rangle = (-1)^{|\alpha|} \langle T, \partial^\alpha \phi \rangle \qquad \text{ for all } \phi \in \mathcal{D}(U).</math>

Differentiation of distributions is a continuous operator on <math>\mathcal{D}'(U);</math> this is an important and desirable property that is not shared by most other notions of differentiation.

If <math>T</math> is a distribution in <math>\R</math> then
<math display=block>\lim_{x \to 0} \frac{T - \tau_x T}{x} = T'\in \mathcal{D}'(\R),</math>
where <math>T'</math> is the derivative of <math>T</math> and <math>\tau_x</math> is a translation by <math>x;</math> thus the derivative of <math>T</math> may be viewed as a limit of quotients.{{sfn|Rudin|1991|p=180}}

====Differential operators acting on smooth functions====

A linear differential operator in <math>U</math> with smooth coefficients acts on the space of smooth functions on <math>U.</math> Given such an operator
<math display=inline>P := \sum_\alpha c_\alpha \partial^\alpha,</math>
we would like to define a continuous linear map, <math>D_P</math> that extends the action of <math>P</math> on <math>C^\infty(U)</math> to distributions on <math>U.</math> In other words, we would like to define <math>D_P</math> such that the following diagram [[Commutative diagram|commutes]]:
<math display=block>\begin{matrix}
\mathcal{D}'(U) & \stackrel{D_P}{\longrightarrow} & \mathcal{D}'(U) \\[2pt]
\uparrow & & \uparrow \\[2pt]
C^\infty(U) & \stackrel{P}{\longrightarrow} & C^\infty(U)
\end{matrix}</math>
where the vertical maps are given by assigning <math>f \in C^\infty(U)</math> its canonical distribution <math>D_f \in \mathcal{D}'(U),</math> which is defined by: 
<math display=block>D_f(\phi) = \langle f, \phi \rangle := \int_U f(x) \phi(x) \,dx \quad \text{ for all } \phi \in \mathcal{D}(U).</math> 
With this notation, the diagram commuting is equivalent to:
<math display=block>D_{P(f)} = D_PD_f \qquad \text{ for all } f \in C^\infty(U).</math>

To find <math>D_P,</math> the transpose <math>{}^{t} P : \mathcal{D}'(U) \to \mathcal{D}'(U)</math> of the continuous induced map <math>P : \mathcal{D}(U)\to \mathcal{D}(U)</math> defined by <math>\phi \mapsto P(\phi)</math> is considered in the lemma below. 
This leads to the following definition of the differential operator on <math>U</math> called {{em|the '''formal transpose''' of <math>P,</math>}} which will be denoted by <math>P_*</math> to avoid confusion with the transpose map, that is defined by
<math display=block>P_* := \sum_\alpha b_\alpha \partial^\alpha \quad \text{ where } \quad b_\alpha := \sum_{\beta \geq \alpha} (-1)^{|\beta|} \binom{\beta}{\alpha} \partial^{\beta-\alpha} c_\beta.</math>

{{math theorem|name=Lemma|math_statement= Let <math>P</math> be a linear differential operator with smooth coefficients in <math>U.</math> Then for all <math>\phi \in \mathcal{D}(U)</math> we have
<math display=block>\left\langle {}^{t}P(D_f), \phi \right\rangle = \left\langle D_{P_*(f)}, \phi \right\rangle,</math>
which is equivalent to:
<math display=block>{}^{t}P(D_f) = D_{P_*(f)}.</math>}}

{{collapse top|title=Proof|left=true}}
As discussed above, for any <math>\phi \in \mathcal{D}(U),</math> the transpose may be calculated by:
<math display=block>\begin{align}
\left\langle {}^{t}P(D_f), \phi \right\rangle &= \int_U f(x) P(\phi)(x) \,dx \\
&= \int_U f(x) \left[\sum\nolimits_\alpha c_\alpha(x) (\partial^\alpha \phi)(x) \right] \,dx \\
&= \sum\nolimits_\alpha \int_U f(x) c_\alpha(x) (\partial^\alpha \phi)(x) \,dx \\
&= \sum\nolimits_\alpha (-1)^{|\alpha|} \int_U \phi(x) (\partial^\alpha(c_\alpha f))(x) \,d x
\end{align}</math>

For the last line we used [[integration by parts]] combined with the fact that <math>\phi</math> and therefore all the functions <math>f (x)c_\alpha (x) \partial^\alpha \phi(x)</math> have compact support.<ref group="note">For example, let <math>U = \R</math> and take <math>P</math> to be the ordinary derivative for functions of one real variable and assume the support of <math>\phi</math> to be contained in the finite interval <math>(a,b),</math> then since <math>\operatorname{supp}(\phi) \subseteq (a, b)</math>
<math display=block>\begin{align}
\int_\R \phi'(x)f(x)\,dx &= \int_a^b \phi'(x)f(x) \,dx \\
&= \phi(x)f(x)\big\vert_a^b - \int_a^b f'(x) \phi(x) \,d x \\
&= \phi(b)f(b) - \phi(a)f(a) - \int_a^b f'(x) \phi(x) \,d x \\
&=-\int_a^b f'(x) \phi(x) \,d x
\end{align}</math>
where the last equality is because <math>\phi(a) = \phi(b) = 0.</math></ref> Continuing the calculation above, for all <math>\phi \in \mathcal{D}(U):</math>
<math display=block>\begin{align}
\left\langle {}^{t}P(D_f), \phi \right\rangle &=\sum\nolimits_\alpha (-1)^{|\alpha|} \int_U \phi(x) (\partial^\alpha(c_\alpha f))(x) \,dx && \text{As shown above} \\[4pt]
&= \int_U \phi(x) \sum\nolimits_\alpha (-1)^{|\alpha|} (\partial^\alpha(c_\alpha f))(x)\,dx \\[4pt]
&= \int_U \phi(x) \sum_\alpha \left[\sum_{\gamma \le \alpha} \binom{\alpha}{\gamma} (\partial^{\gamma}c_\alpha)(x) (\partial^{\alpha-\gamma}f)(x) \right] \,dx && \text{Leibniz rule}\\
&= \int_U \phi(x) \left[\sum_\alpha \sum_{\gamma \le \alpha} (-1)^{|\alpha|} \binom{\alpha}{\gamma} (\partial^{\gamma}c_\alpha)(x) (\partial^{\alpha-\gamma}f)(x)\right] \,dx \\
&= \int_U \phi(x) \left[ \sum_\alpha \left[ \sum_{\beta \geq \alpha} (-1)^{|\beta|} \binom{\beta}{\alpha} \left(\partial^{\beta-\alpha}c_{\beta}\right)(x) \right] (\partial^\alpha f)(x)\right] \,dx && \text{Grouping terms by derivatives of } f \\
&= \int_U \phi(x) \left[\sum\nolimits_\alpha b_\alpha(x) (\partial^\alpha f)(x) \right] \, dx && b_\alpha:=\sum_{\beta \geq \alpha} (-1)^{|\beta|} \binom{\beta}{\alpha} \partial^{\beta-\alpha}c_{\beta} \\
&= \left\langle \left(\sum\nolimits_\alpha b_\alpha \partial^\alpha \right) (f), \phi \right\rangle
\end{align}</math>
{{collapse bottom}}

The Lemma combined with the fact that the formal transpose of the formal transpose is the original differential operator, that is, <math>P_{**}= P,</math>{{sfn|Trèves|2006|pp=247-252}} enables us to arrive at the correct definition: the formal transpose induces the (continuous) canonical linear operator <math>P_* : C_c^\infty(U) \to C_c^\infty(U)</math> defined by <math>\phi \mapsto P_*(\phi).</math> We claim that the transpose of this map, <math>{}^{t}P_* : \mathcal{D}'(U) \to \mathcal{D}'(U),</math> can be taken as <math>D_P.</math> To see this, for every <math>\phi \in \mathcal{D}(U),</math> compute its action on a distribution of the form <math>D_f</math> with <math>f \in C^\infty(U)</math>:

<math display=block>\begin{align}
\left\langle {}^{t}P_*\left(D_f\right),\phi \right\rangle &= \left\langle D_{P_{**}(f)}, \phi \right\rangle && \text{Using Lemma above with } P_* \text{ in place of } P\\
&= \left\langle D_{P(f)}, \phi \right\rangle && P_{**} = P
\end{align}</math>

We call the continuous linear operator <math>D_P := {}^{t}P_* : \mathcal{D}'(U) \to \mathcal{D}'(U)</math> the '''{{em|differential operator on distributions extending <math>P</math>}}'''.{{sfn|Trèves|2006|pp=247-252}} Its action on an arbitrary distribution <math>S</math> is defined via:
<math display=block>D_P(S)(\phi) = S\left(P_*(\phi)\right) \quad \text{ for all } \phi \in \mathcal{D}(U).</math>

If <math>(T_i)_{i=1}^\infty</math> converges to <math>T \in \mathcal{D}'(U)</math> then for every multi-index <math>\alpha, (\partial^\alpha T_i)_{i=1}^\infty</math> converges to <math>\partial^\alpha T \in \mathcal{D}'(U).</math>

====Multiplication of distributions by smooth functions====

A differential operator of order 0 is just multiplication by a smooth function. And conversely, if <math>f</math> is a smooth function then <math>P := f(x)</math> is a differential operator of order 0, whose formal transpose is itself (that is, <math>P_* = P</math>). The induced differential operator <math>D_P : \mathcal{D}'(U) \to \mathcal{D}'(U)</math> maps a distribution <math>T</math> to a distribution denoted by <math>fT := D_P(T).</math> We have thus defined the multiplication of a distribution by a smooth function.

We now give an alternative presentation of the multiplication of a distribution <math>T</math> on <math>U</math> by a smooth function <math>m : U \to \R.</math> The product <math>mT</math> is defined by
<math display=block>\langle mT, \phi \rangle = \langle T, m\phi \rangle \qquad \text{ for all } \phi \in \mathcal{D}(U).</math>

This definition coincides with the transpose definition since if <math>M : \mathcal{D}(U) \to \mathcal{D}(U)</math> is the operator of multiplication by the function <math>m</math> (that is, <math>(M\phi)(x) = m(x)\phi(x)</math>), then
<math display=block>\int_U (M \phi)(x) \psi(x)\,dx = \int_U m(x) \phi(x) \psi(x)\,d x = \int_U \phi(x) m(x) \psi(x) \,d x = \int_U \phi(x) (M \psi)(x)\,d x,</math>
so that <math>{}^tM = M.</math>

Under multiplication by smooth functions, <math>\mathcal{D}'(U)</math> is a [[Module (mathematics)|module]] over the [[ring (mathematics)|ring]] <math>C^\infty(U).</math> With this definition of multiplication by a smooth function, the ordinary [[product rule]] of calculus remains valid. However, some unusual identities also arise. For example, if <math>\delta</math> is the Dirac delta distribution on <math>\R,</math> then <math>m \delta = m(0) \delta,</math> and if <math>\delta^'</math> is the derivative of the delta distribution, then
<math display=block>m\delta' = m(0) \delta' - m' \delta = m(0) \delta' - m'(0) \delta.</math>

The bilinear multiplication map <math>C^\infty(\R^n) \times \mathcal{D}'(\R^n) \to \mathcal{D}'\left(\R^n\right)</math> given by <math>(f,T) \mapsto fT</math> is {{em|not}} continuous; it is however, [[hypocontinuous]].{{sfn|Trèves|2006|p=423}}

'''Example.''' The product of any distribution <math>T</math> with the function that is identically {{math|1}} on <math>U</math> is equal to <math>T.</math>

'''Example.''' Suppose <math>(f_i)_{i=1}^\infty</math> is a sequence of test functions on <math>U</math> that converges to the constant function <math>1 \in C^\infty(U).</math> For any distribution <math>T</math> on <math>U,</math> the sequence <math>(f_i T)_{i=1}^\infty</math> converges to <math>T \in \mathcal{D}'(U).</math>{{sfn|Trèves|2006|p=261}}

If <math>(T_i)_{i=1}^\infty</math> converges to <math>T \in \mathcal{D}'(U)</math> and <math>(f_i)_{i=1}^\infty</math> converges to <math>f \in C^\infty(U)</math> then <math>(f_i T_i)_{i=1}^\infty</math> converges to <math>fT \in \mathcal{D}'(U).</math>

=====Problem of multiplying distributions=====

It is easy to define the product of a distribution with a smooth function, or more generally the product of two distributions whose [[singular support]]s are disjoint.<ref name="StackOverflow">{{cite web|url=https://math.stackexchange.com/q/2338283|title=Multiplication of two distributions whose singular supports are disjoint|date=Jun 27, 2017|publisher=Stack Exchange Network|author=Per Persson (username: md2perpe)}}</ref> With more effort, it is possible to define a well-behaved product of several distributions provided their [[wave front set]]s at each point are compatible. A limitation of the theory of distributions (and hyperfunctions) is that there is no associative product of two distributions extending the product of a distribution by a smooth function, as has been proved by [[Laurent Schwartz]] in the 1950s. For example, if <math>\operatorname{p.v.} \frac{1}{x}</math> is the distribution obtained by the [[Cauchy principal value]]
<math display=block>\left(\operatorname{p.v.} \frac{1}{x}\right)(\phi) = \lim_{\varepsilon\to 0^+} \int_{|x| \geq \varepsilon} \frac{\phi(x)}{x}\, dx \quad \text{ for all } \phi \in \mathcal{S}(\R).</math>

If <math>\delta</math> is the Dirac delta distribution then
<math display=block>(\delta \times x) \times \operatorname{p.v.} \frac{1}{x} = 0</math>
but,
<math display=block>\delta \times \left(x \times \operatorname{p.v.} \frac{1}{x}\right) = \delta</math>
so the product of a distribution by a smooth function (which is always well-defined) cannot be extended to an [[Associativity|associative]] product on the space of distributions.

Thus, nonlinear problems cannot be posed in general and thus are not solved within distribution theory alone. In the context of [[quantum field theory]], however, solutions can be found. In more than two spacetime dimensions the problem is related to the [[Regularization (physics)|regularization]] of [[Ultraviolet divergence|divergences]]. Here [[Henri Epstein]] and [[Vladimir Glaser]] developed the mathematically rigorous (but extremely technical) {{em|[[causal perturbation theory]]}}. This does not solve the problem in other situations. Many other interesting theories are non-linear, like for example the [[Navier–Stokes equations]] of [[fluid dynamics]].

Several not entirely satisfactory{{Citation needed|reason=Why are they not satisfactory?|date=July 2019}} theories of [[Algebra (ring theory)|algebra]]s of [[generalized function]]s have been developed, among which [[Colombeau algebra|Colombeau's (simplified) algebra]] is maybe the most popular in use today.

Inspired by Lyons' [[rough path]] theory,<ref>{{Cite journal|last1=Lyons|first1=T.|title=Differential equations driven by rough signals|doi=10.4171/RMI/240|journal=Revista Matemática Iberoamericana|pages=215–310|year=1998|volume=14 |issue=2 |doi-access=free}}</ref> [[Martin Hairer]] proposed a consistent way of multiplying distributions with certain structures ([[regularity structures]]<ref>{{cite journal|last1=Hairer|first1=Martin|title=A theory of regularity structures|journal=Inventiones Mathematicae|date=2014|doi=10.1007/s00222-014-0505-4|volume=198|issue=2|pages=269–504|bibcode=2014InMat.198..269H|arxiv=1303.5113|s2cid=119138901 }}</ref>), available in many examples from stochastic analysis, notably stochastic partial differential equations. See also Gubinelli–Imkeller–Perkowski (2015) for a related development based on [[Jean-Michel Bony|Bony]]'s [[paraproduct]] from Fourier analysis.

===Composition with a smooth function===

Let <math>T</math> be a distribution on <math>U.</math> Let <math>V</math> be an open set in <math>\R^n</math> and <math>F : V \to U.</math> If <math>F</math> is a [[Submersion (mathematics)|submersion]] then it is possible to define
<math display=block>T \circ F \in \mathcal{D}'(V).</math>

This is {{em|the '''composition''' of the distribution <math>T</math> with <math>F</math>}}, and is also called {{em|the '''[[Pullback (differential geometry)|pullback]]''' of <math>T</math> along <math>F</math>}}, sometimes written
<math display=block>F^\sharp : T \mapsto F^\sharp T = T \circ F.</math>

The pullback is often denoted <math>F^*,</math> although this notation should not be confused with the use of '*' to denote the adjoint of a linear mapping.

The condition that <math>F</math> be a submersion is equivalent to the requirement that the [[Jacobian matrix and determinant|Jacobian]] derivative <math>d F(x)</math> of <math>F</math> is a [[surjective]] linear map for every <math>x \in V.</math> A necessary (but not sufficient) condition for extending <math>F^{\#}</math> to distributions is that <math>F</math> be an [[open mapping]].<ref>See for example {{harvnb|Hörmander|1983|loc=Theorem 6.1.1}}.</ref> The [[Inverse function theorem]] ensures that a submersion satisfies this condition.

If <math>F</math> is a submersion, then <math>F^{\#}</math> is defined on distributions by finding the transpose map. The uniqueness of this extension is guaranteed since <math>F^{\#}</math> is a continuous linear operator on <math>\mathcal{D}(U).</math> Existence, however, requires using the [[Integration by substitution|change of variables]] formula, the inverse function theorem (locally), and a [[partition of unity]] argument.<ref>See {{harvnb|Hörmander|1983|loc=Theorem 6.1.2}}.</ref>

In the special case when <math>F</math> is a [[diffeomorphism]] from an open subset <math>V</math> of <math>\R^n</math> onto an open subset <math>U</math> of <math>\R^n</math> change of variables under the integral gives:
<math display=block>\int_V \phi\circ F(x) \psi(x)\,dx = \int_U \phi(x) \psi \left(F^{-1}(x) \right) \left|\det dF^{-1}(x) \right|\,dx.</math>

In this particular case, then, <math>F^{\#}</math> is defined by the transpose formula:
<math display=block>\left\langle F^\sharp T, \phi \right\rangle = \left\langle T, \left|\det d(F^{-1}) \right|\phi\circ F^{-1} \right\rangle.</math>

===Convolution===

Under some circumstances, it is possible to define the [[convolution]] of a function with a distribution, or even the convolution of two distributions.
Recall that if <math>f</math> and <math>g</math> are functions on <math>\R^n</math> then we denote by <math>f\ast g</math> {{em|the '''convolution''' of <math>f</math> and <math>g,</math>}} defined at <math>x \in \R^n</math> to be the integral
<math display=block>(f \ast g)(x) := \int_{\R^n} f(x-y) g(y) \,dy = \int_{\R^n} f(y)g(x-y) \,dy</math>
provided that the integral exists. If <math>1 \leq p, q, r \leq \infty</math> are such that <math display=inline>\frac{1}{r} = \frac{1}{p} + \frac{1}{q} - 1</math> then for any functions <math>f \in L^p(\R^n)</math> and <math>g \in L^q(\R^n)</math> we have <math>f \ast g \in L^r(\R^n)</math> and <math>\|f\ast g\|_{L^r} \leq \|f\|_{L^p} \|g\|_{L^q}.</math>{{sfn|Trèves|2006|pp=278-283}} If <math>f</math> and <math>g</math> are continuous functions on <math>\R^n,</math> at least one of which has compact support, then <math>\operatorname{supp}(f \ast g) \subseteq \operatorname{supp} (f) + \operatorname{supp} (g)</math> and if <math>A\subseteq \R^n</math> then the value of <math>f\ast g</math> on <math>A</math> do {{em|not}} depend on the values of <math>f</math> outside of the [[Minkowski sum]] <math>A -\operatorname{supp} (g) = \{a-s : a\in A, s\in \operatorname{supp}(g)\}.</math>{{sfn|Trèves|2006|pp=278-283}}

Importantly, if <math>g \in L^1(\R^n)</math> has compact support then for any <math>0 \leq k \leq \infty,</math> the convolution map <math>f \mapsto f \ast g</math> is continuous when considered as the map <math>C^k(\R^n) \to C^k(\R^n)</math> or as the map <math>C_c^k(\R^n) \to C_c^k(\R^n).</math>{{sfn|Trèves|2006|pp=278-283}}

====Translation and symmetry====

Given <math>a \in \R^n,</math> the translation operator <math>\tau_a</math> sends <math>f : \R^n \to \Complex</math> to <math>\tau_a f : \R^n \to \Complex,</math> defined by <math>\tau_a f(y) = f(y-a).</math> This can be extended by the transpose to distributions in the following way: given a distribution <math>T,</math> {{em|the '''translation''' of <math>T</math> by <math>a</math>}} is the distribution <math>\tau_a T : \mathcal{D}(\R^n) \to \Complex</math> defined by <math>\tau_a T(\phi) := \left\langle T, \tau_{-a} \phi \right\rangle.</math>{{sfn|Trèves|2006|pp=284-297}}<ref>See for example {{harvnb|Rudin|1991|loc=§6.29}}.</ref>

Given <math>f : \R^n \to \Complex,</math> define the function <math>\tilde{f} : \R^n \to \Complex</math> by <math>\tilde{f}(x) := f(-x).</math> Given a distribution <math>T,</math> let <math>\tilde{T} : \mathcal{D}(\R^n) \to \Complex</math> be the distribution defined by <math>\tilde{T}(\phi) := T \left(\tilde{\phi}\right).</math> The operator <math>T \mapsto \tilde{T}</math> is called '''{{em|the symmetry with respect to the origin}}'''.{{sfn|Trèves|2006|pp=284-297}}

====Convolution of a test function with a distribution====

Convolution with <math>f \in \mathcal{D}(\R^n)</math> defines a linear map:
<math display=block>\begin{alignat}{4}
C_f : \,& \mathcal{D}(\R^n) && \to    \,&& \mathcal{D}(\R^n) \\
        & g                 && \mapsto\,&& f \ast g \\
\end{alignat}</math>
which is [[continuous function|continuous]] with respect to the canonical [[LF space]] topology on <math>\mathcal{D}(\R^n).</math>

Convolution of <math>f</math> with a distribution <math>T \in \mathcal{D}'(\R^n)</math> can be defined by taking the transpose of <math>C_f</math> relative to the duality pairing of <math>\mathcal{D}(\R^n)</math> with the space <math>\mathcal{D}'(\R^n)</math> of distributions.{{sfn|Trèves|2006|loc=Chapter 27}} If <math>f, g, \phi \in \mathcal{D}(\R^n),</math> then by [[Fubini's theorem]]
<math display=block>\langle C_fg, \phi \rangle = \int_{\R^n}\phi(x)\int_{\R^n}f(x-y) g(y) \,dy \,dx = \left\langle g,C_{\tilde{f}}\phi \right\rangle.</math>

Extending by continuity, the convolution of <math>f</math> with a distribution <math>T</math> is defined by
<math display=block>\langle f \ast T, \phi \rangle = \left\langle T, \tilde{f} \ast \phi \right\rangle, \quad \text{ for all } \phi \in \mathcal{D}(\R^n).</math>

An alternative way to define the convolution of a test function <math>f</math> and a distribution <math>T</math> is to use the translation operator <math>\tau_a.</math> The convolution of the compactly supported function <math>f</math> and the distribution <math>T</math> is then the function defined for each <math>x \in \R^n</math> by
<math display=block>(f \ast T)(x) = \left\langle T, \tau_x \tilde{f} \right\rangle.</math>

It can be shown that the convolution of a smooth, compactly supported function and a distribution is a smooth function. If the distribution <math>T</math> has compact support, and if <math>f</math> is a polynomial (resp. an exponential function, an analytic function, the restriction of an entire analytic function on <math>\Complex^n</math> to <math>\R^n,</math> the restriction of an entire function of exponential type in <math>\Complex^n</math> to <math>\R^n</math>), then the same is true of <math>T \ast f.</math>{{sfn|Trèves|2006|pp=284-297}} If the distribution <math>T</math> has compact support as well, then <math>f\ast T</math> is a compactly supported function, and the [[Titchmarsh convolution theorem]] {{harvtxt|Hörmander|1983|loc=Theorem 4.3.3}} implies that:
<math display=block>\operatorname{ch}(\operatorname{supp}(f \ast T)) = \operatorname{ch}(\operatorname{supp}(f)) + \operatorname{ch} (\operatorname{supp}(T))</math>
where <math>\operatorname{ch}</math> denotes the [[convex hull]] and <math>\operatorname{supp}</math> denotes the support.

====Convolution of a smooth function with a distribution====

Let <math>f \in C^\infty(\R^n)</math> and <math>T \in \mathcal{D}'(\R^n)</math> and assume that at least one of <math>f</math> and <math>T</math> has compact support. The '''{{em|convolution}}''' of <math>f</math> and <math>T,</math> denoted by <math>f \ast T</math> or by <math>T \ast f,</math> is the smooth function:{{sfn|Trèves|2006|pp=284-297}}
<math display=block>\begin{alignat}{4}
f \ast T : \,& \R^n && \to    \,&& \Complex \\
             & x    && \mapsto\,&& \left\langle T, \tau_x \tilde{f} \right\rangle \\
\end{alignat}</math>
satisfying for all <math>p \in \N^n</math>:
<math display=block>\begin{align}
&\operatorname{supp}(f \ast T) \subseteq \operatorname{supp}(f)+ \operatorname{supp}(T) \\[6pt]
&\text{ for all } p \in \N^n: \quad
\begin{cases}\partial^p \left\langle T, \tau_x \tilde{f} \right\rangle = \left\langle T, \partial^p \tau_x \tilde{f} \right\rangle \\
\partial^p (T \ast f) = (\partial^p T) \ast f = T \ast (\partial^p f).
\end{cases}
\end{align}</math>

Let <math>M</math> be the map <math>f \mapsto T \ast f</math>. If <math>T</math> is a distribution, then <math>M</math> is continuous as a map <math>\mathcal{D}(\R^n) \to C^\infty(\R^n)</math>. If <math>T</math> also has compact support, then <math>M</math> is also continuous as the map <math>C^\infty(\R^n) \to C^\infty(\R^n)</math> and continuous as the map <math>\mathcal{D}(\R^n) \to \mathcal{D}(\R^n).</math>{{sfn|Trèves|2006|pp=284-297}}

If <math>L : \mathcal{D}(\R^n) \to C^\infty(\R^n)</math> is a continuous linear map such that <math>L \partial^\alpha \phi = \partial^\alpha L \phi</math> for all <math>\alpha</math> and all <math>\phi \in \mathcal{D}(\R^n)</math> then there exists a distribution <math>T \in \mathcal{D}'(\R^n)</math> such that <math>L \phi = T \circ \phi</math> for all <math>\phi \in \mathcal{D}(\R^n).</math>{{sfn|Rudin|1991|pp=149-181}}

'''Example.'''{{sfn|Rudin|1991|pp=149-181}} Let <math>H</math> be the [[Heaviside step function|Heaviside function]] on <math>\R.</math> For any <math>\phi \in \mathcal{D}(\R),</math>
<math display=block>(H \ast \phi)(x) = \int_{-\infty}^x \phi(t) \, dt.</math>

Let <math>\delta</math> be the Dirac measure at 0 and let <math>\delta'</math> be its derivative as a distribution. Then <math>\delta' \ast H = \delta</math> and <math>1 \ast \delta' = 0.</math> Importantly, the associative law fails to hold:
<math display=block>1 = 1 \ast \delta = 1 \ast (\delta' \ast H ) \neq (1 \ast \delta') \ast H = 0 \ast H = 0.</math>

====Convolution of distributions====

It is also possible to define the convolution of two distributions <math>S</math> and <math>T</math> on <math>\R^n,</math> provided one of them has compact support. Informally, to define <math>S \ast T</math> where <math>T</math> has compact support, the idea is to extend the definition of the convolution <math>\,\ast\,</math> to a linear operation on distributions so that the associativity formula
<math display=block>S \ast (T \ast \phi) = (S \ast T) \ast \phi</math>
continues to hold for all test functions <math>\phi.</math><ref>{{harvnb|Hörmander|1983|loc=§IV.2}} proves the uniqueness of such an extension.</ref>

It is also possible to provide a more explicit characterization of the convolution of distributions.{{sfn|Trèves|2006|loc=Chapter 27}} Suppose that <math>S</math> and <math>T</math> are distributions and that <math>S</math> has compact support. Then the linear maps
<math display=block>\begin{alignat}{9}
\bullet \ast \tilde{S} : \,& \mathcal{D}(\R^n) && \to    \,&& \mathcal{D}(\R^n) && \quad \text{ and } \quad && \bullet \ast \tilde{T} : \,&& \mathcal{D}(\R^n) && \to    \,&& \mathcal{D}(\R^n) \\
    & f                       && \mapsto\,&& f \ast \tilde{S}    &&       &&     && f                       && \mapsto\,&& f \ast \tilde{T} \\
\end{alignat}</math>
are continuous. The transposes of these maps:
<math display=block>{}^{t}\left(\bullet \ast \tilde{S}\right) : \mathcal{D}'(\R^n) \to \mathcal{D}'(\R^n) \qquad {}^{t}\left(\bullet \ast \tilde{T}\right) : \mathcal{E}'(\R^n) \to \mathcal{D}'(\R^n)</math>
are consequently continuous and it can also be shown that{{sfn|Trèves|2006|pp=284-297}}
<math display=block>{}^{t}\left(\bullet \ast \tilde{S}\right)(T) = {}^{t}\left(\bullet \ast \tilde{T}\right)(S).</math>

This common value is called {{em|the '''convolution''' of <math>S</math> and <math>T</math>}} and it is a distribution that is denoted by <math>S \ast T</math> or <math>T \ast S.</math> It satisfies <math>\operatorname{supp} (S \ast T) \subseteq \operatorname{supp}(S) + \operatorname{supp}(T).</math>{{sfn|Trèves|2006|pp=284-297}} If <math>S</math> and <math>T</math> are two distributions, at least one of which has compact support, then for any <math>a \in \R^n,</math> <math>\tau_a(S \ast T) = \left(\tau_a S\right) \ast T = S \ast \left(\tau_a T\right).</math>{{sfn|Trèves|2006|pp=284-297}} If <math>T</math> is a distribution in <math>\R^n</math> and if <math>\delta</math> is a [[Dirac measure]] then <math>T \ast \delta = T = \delta \ast T</math>;{{sfn|Trèves|2006|pp=284-297}} thus <math>\delta</math> is the [[identity element]] of the convolution operation. Moreover, if <math>f</math> is a function then <math>f \ast \delta^{\prime} = f^{\prime} = \delta^{\prime} \ast f</math> where now the associativity of convolution implies that <math>f^{\prime} \ast g = g^{\prime} \ast f</math> for all functions <math>f</math> and <math>g.</math>

Suppose that it is <math>T</math> that has compact support. For <math>\phi \in \mathcal{D}(\R^n)</math> consider the function
<math display=block>\psi(x) = \langle T, \tau_{-x} \phi \rangle.</math>

It can be readily shown that this defines a smooth function of <math>x,</math> which moreover has compact support. The convolution of <math>S</math> and <math>T</math> is defined by
<math display=block>\langle S \ast T, \phi \rangle = \langle S, \psi \rangle.</math>

This generalizes the classical notion of [[convolution]] of functions and is compatible with differentiation in the following sense: for every multi-index <math>\alpha.</math>
<math display=block>\partial^\alpha(S \ast T) = (\partial^\alpha S) \ast T = S \ast (\partial^\alpha T).</math>

The convolution of a finite number of distributions, all of which (except possibly one) have compact support, is [[associative]].{{sfn|Trèves|2006|pp=284-297}}

This definition of convolution remains valid under less restrictive assumptions about <math>S</math> and <math>T.</math><ref>See for instance {{harvnb|Gel'fand|Shilov|1966–1968|loc=v. 1, pp. 103–104}} and {{harvnb|Benedetto|1997|loc=Definition 2.5.8}}.</ref>

The convolution of distributions with compact support induces a continuous bilinear map <math>\mathcal{E}' \times \mathcal{E}' \to \mathcal{E}'</math> defined by <math>(S,T) \mapsto S * T,</math> where <math>\mathcal{E}'</math> denotes the space of distributions with compact support.{{sfn|Trèves|2006|p=423}} However, the convolution map as a function <math>\mathcal{E}' \times \mathcal{D}' \to \mathcal{D}'</math> is {{em|not}} continuous{{sfn|Trèves|2006|p=423}} although it is separately continuous.{{sfn|Trèves|2006|p=294}} The convolution maps <math>\mathcal{D}(\R^n) \times \mathcal{D}' \to \mathcal{D}'</math> and <math>\mathcal{D}(\R^n) \times \mathcal{D}' \to \mathcal{D}(\R^n)</math> given by <math>(f, T) \mapsto f * T</math> both {{em|fail}} to be continuous.{{sfn|Trèves|2006|p=423}} Each of these non-continuous maps is, however, [[separately continuous]] and [[hypocontinuous]].{{sfn|Trèves|2006|p=423}}

====Convolution versus multiplication====

In general, [[Regularization (physics)|regularity]] is required for multiplication products, and [[Principle of locality|locality]] is required for convolution products. It is expressed in the following extension of the [[Convolution theorem|Convolution Theorem]] which guarantees the existence of both convolution and multiplication products. Let <math>F(\alpha) = f \in \mathcal{O}'_C</math> be a rapidly decreasing tempered distribution or, equivalently, <math>F(f) = \alpha \in \mathcal{O}_M</math> be an ordinary (slowly growing, smooth) function within the space of tempered distributions and let <math>F</math> be the normalized (unitary, ordinary frequency) [[Fourier transform]].<ref>{{cite book|last=Folland|first=G.B.|title=Harmonic Analysis in Phase Space|publisher=Princeton University Press|publication-place=Princeton, NJ|year=1989}}</ref> Then, according to {{harvtxt|Schwartz|1951}},
<math display=block>F(f * g) = F(f) \cdot F(g) \qquad \text{ and } \qquad F(\alpha \cdot g) = F(\alpha) * F(g)</math>
hold within the space of tempered distributions.<ref>{{cite book|last=Horváth|first=John|author-link = John Horvath (mathematician)|title=Topological Vector Spaces and Distributions|publisher=Addison-Wesley Publishing Company|publication-place=Reading, MA|year=1966}}</ref><ref>{{cite book|last=Barros-Neto|first=José|title=An Introduction to the Theory of Distributions|publisher=Dekker|publication-place=New York, NY|year=1973}}</ref><ref>{{cite book|last=Petersen|first=Bent E.|title=Introduction to the Fourier Transform and Pseudo-Differential Operators|publisher=Pitman Publishing|publication-place=Boston, MA|year=1983}}</ref> In particular, these equations become the [[Poisson summation formula|Poisson Summation Formula]] if <math>g \equiv \operatorname{\text{Ш}}</math> is the [[Dirac comb|Dirac Comb]].<ref>{{cite book|last=Woodward|first=P.M.|title=Probability and Information Theory with Applications to Radar|publisher=Pergamon Press|publication-place=Oxford, UK|year=1953}}</ref> The space of all rapidly decreasing tempered distributions is also called the space of {{em|convolution operators}} <math>\mathcal{O}'_C</math> and the space of all ordinary functions within the space of tempered distributions is also called the space of {{em|multiplication operators}} <math>\mathcal{O}_M.</math> More generally, <math>F(\mathcal{O}'_C) = \mathcal{O}_M</math> and <math>F(\mathcal{O}_M) = \mathcal{O}'_C.</math>{{sfn|Trèves|2006|pp=318-319}}<ref>{{cite book|last1=Friedlander|first1=F.G.|last2=Joshi|first2=M.S.|title=Introduction to the Theory of Distributions|publisher=Cambridge University Press|publication-place=Cambridge, UK|year=1998}}</ref> A particular case is the [[Paley–Wiener theorem#Schwartz's Paley–Wiener theorem|Paley-Wiener-Schwartz Theorem]] which states that <math>F(\mathcal{E}') = \operatorname{PW}</math> and <math>F(\operatorname{PW} ) = \mathcal{E}'.</math> This is because <math>\mathcal{E}' \subseteq \mathcal{O}'_C</math> and <math>\operatorname{PW} \subseteq \mathcal{O}_M.</math> In other words, compactly supported tempered distributions <math>\mathcal{E}'</math> belong to the space of {{em|convolution operators}} <math>\mathcal{O}'_C</math> and
Paley-Wiener functions <math>\operatorname{PW},</math> better known as [[Bandlimiting|bandlimited functions]], belong to the space of {{em|multiplication operators}} <math>\mathcal{O}_M.</math>{{sfn|Schwartz|1951}}

For example, let <math>g \equiv \operatorname{\text{Ш}} \in \mathcal{S}'</math> be the Dirac comb and <math>f \equiv \delta \in \mathcal{E}'</math> be the [[Dirac delta function|Dirac delta]];then <math>\alpha \equiv 1 \in \operatorname{PW}</math> is the function that is constantly one and both equations yield the [[Dirac comb#Dirac-comb identity|Dirac-comb identity]]. Another example is to let <math>g</math> be the Dirac comb and <math>f \equiv \operatorname{rect} \in \mathcal{E}'</math> be the [[rectangular function]]; then <math>\alpha \equiv \operatorname{sinc} \in \operatorname{PW}</math> is the [[sinc function]] and both equations yield the [[Nyquist–Shannon sampling theorem|Classical Sampling Theorem]] for suitable <math>\operatorname{rect}</math> functions. More generally, if <math>g</math> is the Dirac comb and <math>f \in \mathcal{S} \subseteq \mathcal{O}'_C \cap \mathcal{O}_M</math> is a [[Smoothness|smooth]] [[window function]] ([[Schwartz space|Schwartz function]]), for example, the [[Gaussian function|Gaussian]], then <math>\alpha \in \mathcal{S}</math> is another smooth window function (Schwartz function). They are known as [[mollifier]]s, especially in [[partial differential equation]]s theory, or as [[Regularization (mathematics)|regularizers]] in [[Regularization (physics)|physics]] because they allow turning [[generalized function]]s into [[Function (mathematics)|regular functions]].

===Tensor products of distributions{{anchor|Tensor product of distributions}}===

Let <math>U \subseteq \R^m</math> and <math>V \subseteq \R^n</math> be open sets. Assume all vector spaces to be over the field <math>\mathbb{F},</math> where <math>\mathbb{F}=\R</math> or <math>\Complex.</math> For <math>f \in \mathcal{D}(U \times V)</math> define for every <math>u \in U</math> and every <math>v \in V</math> the following functions:
<math display=block>\begin{alignat}{9}
f_u : \,& V && \to    \,&& \mathbb{F} && \quad \text{ and } \quad && f^v : \,&& U && \to    \,&& \mathbb{F} \\
        & y && \mapsto\,&& f(u, y)    &&                          &&         && x && \mapsto\,&& f(x, v) \\
\end{alignat}</math>

Given <math>S \in \mathcal{D}^{\prime}(U)</math> and <math>T \in \mathcal{D}^{\prime}(V),</math> define the following functions:
<math display=block>\begin{alignat}{9}
\langle S, f^{\bullet}\rangle : \,& V && \to    \,&& \mathbb{F} && \quad \text{ and } \quad && \langle T, f_{\bullet}\rangle : \,&& U && \to    \,&& \mathbb{F} \\
                                  & v && \mapsto\,&& \langle S, f^v \rangle &&              &&                                   && u && \mapsto\,&& \langle T, f_u \rangle \\
\end{alignat}</math>
where <math>\langle T, f_{\bullet}\rangle \in \mathcal{D}(U)</math> and <math>\langle S, f^{\bullet}\rangle \in \mathcal{D}(V).</math> 
These definitions associate every <math>S \in \mathcal{D}'(U)</math> and <math>T \in \mathcal{D}'(V)</math> with the (respective) continuous linear map:
<math display=block>\begin{alignat}{9}
  \,&& \mathcal{D}(U \times V) & \to    \,&& \mathcal{D}(V) && \quad \text{ and } \quad &&   \,& \mathcal{D}(U \times V) && \to    \,&& \mathcal{D}(U) \\
    && f                   \   & \mapsto\,&& \langle S, f^{\bullet} \rangle    &&       &&     & f                   \   && \mapsto\,&& \langle T, f_{\bullet} \rangle \\
\end{alignat}</math>

Moreover, if either <math>S</math> (resp. <math>T</math>) has compact support then it also induces a continuous linear map of <math>C^\infty(U \times V) \to C^\infty(V)</math> (resp. {{nowrap|<math>C^\infty(U \times V) \to C^\infty(U)</math>).}}{{sfn|Trèves|2006|pp=416-419}}

{{Math theorem|name={{visible anchor|Fubini's theorem for distributions|text=[[Fubini's theorem]] for distributions}}{{sfn|Trèves|2006|pp=416-419}}|math_statement=
Let <math>S \in \mathcal{D}'(U)</math> and <math>T \in \mathcal{D}'(V).</math> If <math>f \in \mathcal{D}(U \times V)</math> then 
<math display=block>\langle S, \langle T, f_{\bullet} \rangle \rangle = \langle T, \langle S, f^{\bullet} \rangle \rangle.</math>
}}

{{em|The [[Tensor product|'''{{visible anchor|tensor product}}''']] of <math>S \in \mathcal{D}'(U)</math> and <math>T \in \mathcal{D}'(V),</math>}} denoted by <math>S \otimes T</math> or <math>T \otimes S,</math> is the distribution in <math>U \times V</math> defined by:{{sfn|Trèves|2006|pp=416-419}}
<math display=block>(S \otimes T)(f) := \langle S, \langle T, f_{\bullet} \rangle \rangle = \langle T, \langle S, f^{\bullet}\rangle \rangle.</math>