Editing Moment-generating function

{{Short description|Concept in probability theory and statistics}}
In [[probability theory]] and [[statistics]], the '''moment-generating function''' of a real-valued [[random variable]] is an alternative specification of its [[probability distribution]]. Thus, it provides the basis of an alternative route to analytical results compared with working directly with [[probability density function]]s or [[cumulative distribution function]]s. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

As its name implies, the moment-[[generating function]] can be used to compute a distribution’s [[Moment (mathematics)|moments]]: the {{mvar|n}}-th moment about 0 is the {{mvar|n}}-th derivative of the moment-generating function, evaluated at 0.

In addition to univariate real-valued distributions, moment-generating functions can also be defined for vector- or matrix-valued random variables, and can even be extended to more general cases.

The moment-generating function of a real-valued distribution does not always exist, unlike the [[Characteristic function (probability theory)|characteristic function]]. There are relations between the behavior of the moment-generating function of a distribution and properties of the distribution, such as the existence of moments.

==Definition==
Let <math> X </math> be a [[random variable]] with [[Cumulative distribution function|CDF]] <math>F_X</math>. The moment generating function (mgf) of <math>X</math> (or <math>F_X</math>), denoted by <math>M_X(t)</math>, is

<math display="block"> M_X(t) = \operatorname E \left[e^{tX}\right] </math>

provided this [[expected value|expectation]] exists for <math>t</math> in some open [[Neighborhood (mathematics)|neighborhood]] of 0. That is, there is an <math>h > 0</math> such that for all <math>t</math> in  <math>-h < 0 < h</math>,  <math>\operatorname E \left[e^{tX}\right] </math> exists. If the expectation does not exist in an open neighborhood of 0, we say that the moment generating function does not exist.<ref>{{cite book |last1=Casella |first1=George|last2= Berger|first2= Roger L. |title=Statistical Inference |publisher=Wadsworth & Brooks/Cole|year=1990 |page=61 |isbn=0-534-11958-1 }}</ref>

In other words, the moment-generating function of {{mvar|X}} is the [[expected value|expectation]] of the random variable <math> e^{tX}</math>. More generally, when <math>\mathbf X = ( X_1, \ldots, X_n)^{\mathrm{T}}</math>, an <math>n</math>-dimensional [[random vector]], and <math>\mathbf t</math> is a fixed vector, one uses <math>\mathbf t \cdot \mathbf X = \mathbf t^\mathrm T\mathbf X</math> instead of&nbsp;{{nowrap|<math>tX</math>:}}
<math display="block"> M_{\mathbf X}(\mathbf t) := \operatorname E \left[e^{\mathbf t^\mathrm T\mathbf X}\right].</math>

<math> M_X(0) </math> always exists and is equal to&nbsp;1. However, a key problem with moment-generating functions is that moments and the moment-generating function may not exist, as the integrals need not converge absolutely. By contrast, the [[Characteristic function (probability theory)|characteristic function]] or Fourier transform always exists (because it is the integral of a bounded function on a space of finite [[measure (mathematics)|measure]]), and for some purposes may be used instead.

The moment-generating function is so named because it can be used to find the moments of the distribution.<ref>{{cite book |last=Bulmer |first=M. G. |title=Principles of Statistics |publisher=Dover |year=1979 |pages=75–79 |isbn=0-486-63760-3 }}</ref>  The series expansion of <math>e^{tX}</math> is

<math display="block">
e^{t X} = 1 + t X + \frac{t^2 X^2}{2!} + \frac{t^3 X^3}{3!} + \cdots + \frac{t^n X^n}{n!} + \cdots.
</math>

Hence,
<math display="block">\begin{align}
M_X(t) &= \operatorname E [e^{t X}] \\[1ex]
&= 1 + t \operatorname E[X] + \frac{t^2 \operatorname E[X^2]}{2!} + \frac{t^3 \operatorname E[X^3]}{3!} + \cdots + \frac{t^n\operatorname E [X^n]}{n!}+\cdots \\[1ex]
& = 1 + t m_1 + \frac{t^2 m_2}{2!} + \frac{t^3 m_3}{3!} + \cdots + \frac{t^n m_n}{n!} + \cdots,
\end{align}</math>

where <math>m_n</math> is the {{nowrap|<math>n</math>-th}} [[moment (mathematics)|moment]]. Differentiating <math>M_X(t)</math> <math>i</math> times with respect to <math>t</math> and setting <math>t = 0</math>, we obtain the <math>i</math>-th moment about the origin, <math>m_i</math>; see {{slink|#Calculations of moments}} below.

If <math>X</math> is a continuous random variable, the following relation between its moment-generating function <math>M_X(t)</math> and the [[two-sided Laplace transform]] of its probability density function <math>f_X(x)</math> holds:

<math display="block">M_X(t) = \mathcal{L}\{f_X\}(-t),</math>

since the PDF's two-sided Laplace transform is given as

<math display="block">\mathcal{L}\{f_X\}(s) = \int_{-\infty}^\infty e^{-sx} f_X(x)\, dx,</math>

and the moment-generating function's definition expands (by the [[law of the unconscious statistician]]) to
<math display="block">M_X(t) = \operatorname E \left[e^{tX}\right] = \int_{-\infty}^\infty e^{tx} f_X(x)\, dx.</math>

This is consistent with the characteristic function of <math>X</math> being a [[Wick rotation]] of <math>M_X(t)</math> when the moment generating function exists, as the characteristic function of a continuous random variable <math>X</math> is the [[Fourier transform]] of its probability density function <math>f_X(x)</math>, and in general when a function <math>f(x)</math> is of [[exponential order]], the Fourier transform of <math>f</math> is a Wick rotation of its two-sided Laplace transform in the region of convergence. See [[Fourier transform#Laplace transform|the relation of the Fourier and Laplace transforms]] for further information.

==Examples==
Here are some examples of the moment-generating function and the characteristic function for comparison. It can be seen that the characteristic function is a [[Wick rotation]] of the moment-generating function <math>M_X(t)</math> when the latter exists.
{|class="wikitable" style="padding-left:1.5em;"
|-
! Distribution
! Moment-generating function <math>M_X(t)</math>
! Characteristic function <math>\varphi (t)</math>
|-
|[[Degenerate distribution|Degenerate]] <math>\delta_a</math>
|<math>e^{ta}</math>
|<math>e^{ita}</math>
|-
| [[Bernoulli distribution|Bernoulli]] <math>P(X = 1) = p</math> 
| <math>1 - p + pe^t</math>
| <math>1 - p + pe^{it}</math>
|-
| [[Binomial distribution|Binomial]] <math>B(n, p)</math>
| <math>\left(1 - p + pe^t\right)^n</math>
| <math>\left(1 - p + pe^{it}\right)^n</math>
|-
| [[Geometric distribution|Geometric]]  <math>(1 - p)^{k}\,p</math>
| <math>\frac{p}{1 - (1 - p) e^t}, ~ t < -\ln(1 - p)</math>
| <math>\frac{p}{1 - (1 - p)\,e^{it}}</math>
|-
|[[Negative binomial distribution|Negative binomial]] <math>\operatorname{NB}(r, p)</math>
|<math>\left(\frac{p}{1 - e^t + pe^t}\right)^r, ~ t<-\ln(1-p)</math>
|<math>\left(\frac{p}{1 - e^{it} + pe^{it}}\right)^r</math>
|-
| [[Poisson distribution|Poisson]] <math>\operatorname{Pois}(\lambda)</math>
| <math>e^{\lambda(e^t - 1)}</math> 
| <math>e^{\lambda(e^{it} - 1)}</math> 
|- 
| [[Uniform distribution (continuous)|Uniform (continuous)]] <math>\operatorname U(a, b)</math>
| <math>\frac{e^{tb} - e^{ta}}{t(b - a)}</math>
| <math>\frac{e^{itb} - e^{ita}}{it(b - a)}</math>
|- 
| [[Discrete uniform distribution|Uniform (discrete)]] <math>\operatorname{DU}(a, b)</math>
| <math>\frac{e^{at} - e^{(b + 1)t}}{(b - a + 1)(1 - e^t)}</math>
| <math>\frac{e^{ait} - e^{(b + 1)it}}{(b - a + 1)(1 - e^{it})}</math>
|-
|[[Laplace distribution|Laplace]] <math>L(\mu, b)</math>
|<math>\frac{e^{t\mu}}{1 - b^2t^2}, ~ |t| < 1/b</math>
|<math>\frac{e^{it\mu}}{1 + b^2t^2}</math>
|-
| [[Normal distribution|Normal]] <math>N(\mu, \sigma^2)</math>
| <math>e^{t\mu + \sigma^2 t^2 / 2}</math>
| <math>e^{it\mu - \sigma^2 t^2 / 2}</math>
|-
| [[Chi-squared distribution|Chi-squared]] <math>\chi^2_k</math>
| <math>{\left(1 - 2t\right)}^{-k/2}, ~ t < 1/2</math>
| <math>{\left(1 - 2it\right)}^{-{k}/{2}}</math>
|-
|[[Noncentral chi-squared distribution|Noncentral chi-squared]] <math>\chi^2_k(\lambda)</math>
| <math>e^{\lambda t/(1-2t)} {\left(1 - 2t\right)}^{-k/2}</math>
| <math>e^{i\lambda t/(1-2it)} {\left(1 - 2it\right)}^{-k/2}</math>
|-
| [[Gamma distribution|Gamma]] <math>\Gamma(k, \tfrac{1}{\theta})</math>
|<math>{\left(1 - t\theta\right)}^{-k}, ~ t < \tfrac{1}{\theta}</math>
| <math>{\left(1 - it\theta\right)}^{-k}</math>
|-
| [[Exponential distribution|Exponential]] <math>\operatorname{Exp}(\lambda)</math>
| <math>\left(1 - t\lambda^{-1}\right)^{-1}, ~ t < \lambda</math>
| <math>\left(1 - it\lambda^{-1}\right)^{-1}</math>
|-
|[[Beta distribution|Beta]]
|<math>1  +\sum_{k=1}^{\infty} \left( \prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r} \right) \frac{t^k}{k!}</math>
|<math>{}_1F_1(\alpha; \alpha+\beta; i\,t)\! </math> (see [[Confluent hypergeometric function]])
|-
| [[Multivariate normal distribution|Multivariate normal]] <math>N(\mathbf{\mu}, \mathbf{\Sigma})</math>
|<math>\exp\left[\mathbf{t}^\mathrm{T} \left(  \boldsymbol{\mu} + \tfrac{1}{2} \boldsymbol{\Sigma} \mathbf{t}\right)\right]</math>
|<math>\exp\left[\mathbf{t}^\mathrm{T} \left(i \boldsymbol{\mu} - \tfrac{1}{2} \boldsymbol{\Sigma} \mathbf{t}\right)\right]</math>
|-
| [[Cauchy distribution|Cauchy]] <math>\operatorname{Cauchy}(\mu, \theta)</math>
|[[Indeterminate form|Does not exist]]
| <math>e^{it\mu - \theta|t|}</math>
|-
|[[Multivariate Cauchy distribution|Multivariate Cauchy]] 
<math>\operatorname{MultiCauchy}(\mu, \Sigma)</math><ref>Kotz et al.{{full citation needed|date=December 2019}} p. 37 using 1 as the number of degree of freedom to recover the Cauchy distribution</ref>
|Does not exist
|<math>\exp\left(i\mathbf{t}^{\mathrm{T}}\boldsymbol\mu - \sqrt{\mathbf{t}^{\mathrm{T}}\boldsymbol{\Sigma} \mathbf{t}}\right)</math>
|}

==Calculation==
The moment-generating function is the expectation of a function of the random variable, it can be written as:

* For a discrete [[probability mass function]], <math>M_X(t)=\sum_{i=0}^\infty e^{tx_i}\, p_i</math>
* For a continuous [[probability density function]], <math> M_X(t)  = \int_{-\infty}^\infty e^{tx} f(x)\,dx </math>
* In the general case: <math>M_X(t) = \int_{-\infty}^\infty e^{tx}\,dF(x)</math>, using the [[Riemann&ndash;Stieltjes integral]], and  where <math>F</math> is the [[cumulative distribution function]]. This is simply the [[Laplace-Stieltjes transform]] of <math>F</math>, but with the sign of the argument reversed.

Note that for the case where <math>X</math> has a continuous [[probability density function]] <math>f(x)</math>,  <math>M_X(-t)</math> is the [[two-sided Laplace transform]] of <math>f(x)</math>.

<math display="block">\begin{align}
M_X(t) & = \int_{-\infty}^\infty e^{tx} f(x)\,dx \\[1ex]
& = \int_{-\infty}^\infty \left( 1+ tx + \frac{t^2 x^2}{2!} + \cdots + \frac{t^n x^n}{n!} + \cdots\right) f(x)\,dx \\[1ex]
& = 1 + tm_1 + \frac{t^2 m_2}{2!} + \cdots + \frac{t^n m_n}{n!} +\cdots,
\end{align}</math>

where <math>m_n</math> is the <math>n</math>th [[moment (mathematics)|moment]].

===Linear transformations of random variables ===
If random variable <math>X</math> has moment generating function <math>M_X(t)</math>, then <math>\alpha X + \beta</math> has moment generating function <math>M_{\alpha X + \beta}(t) = e^{\beta t}M_X(\alpha t)</math>

<math display="block">
M_{\alpha X + \beta}(t) = \operatorname{E}\left[e^{(\alpha X + \beta) t}\right] = e^{\beta t} \operatorname{E}\left[e^{\alpha Xt}\right] = e^{\beta t} M_X(\alpha t)
</math>

===Linear combination of independent random variables===
If <math display="inline">S_n = \sum_{i=1}^n a_i X_i</math>, where the {{math|''X''<sub>''i''</sub>}} are independent random variables and the {{math|''a''<sub>''i''</sub>}} are constants, then the probability density function for {{math|''S''<sub>''n''</sub>}} is the [[convolution]] of the probability density functions of each of the {{math|''X''<sub>''i''</sub>}}, and the moment-generating function for {{math|''S''<sub>''n''</sub>}} is given by

<math display="block">
M_{S_n}(t) = M_{X_1}(a_1t) M_{X_2}(a_2t) \cdots M_{X_n}(a_nt) \, .
</math>
<!----------
Below was lifted from [[generating function]] ... there should be an 
analog for the moment-generating functionbuted with common probability-generating function ''G''<sub>X</sub>, then
<math display="block">G_{S_N}(z) = G_N(G_X(z)).</math>
-------->

===Vector-valued random variables===
For [[random vector|vector-valued random variables]] <math>\mathbf X</math> with [[real number|real]] components, the moment-generating function is given by

<math display="block"> M_X(\mathbf t) = \operatorname{E}\left[e^{\langle \mathbf t, \mathbf X \rangle}\right] </math>

where <math>\mathbf t</math> is a vector and <math>\langle \cdot, \cdot \rangle</math> is the [[dot product]].

==Important properties==

Moment generating functions are positive and [[Logarithmically convex function|log-convex]],{{Citation needed|reason=log-convexity|date=June 2023}} with ''M''(0) = 1.

An important property of the moment-generating function is that it uniquely determines the distribution. In other words, if <math>X</math> and <math>Y</math> are two random variables and for all values of&nbsp;{{mvar|t}},

<math display="block">M_X(t) = M_Y(t), </math>
then
<math display="block">F_X(x) = F_Y(x) </math>

for all values of {{mvar|x}} (or equivalently {{mvar|X}} and {{mvar|Y}} have the same distribution). This statement is not equivalent to the statement "if two distributions have the same moments, then they are identical at all points." This is because in some cases, the moments exist and yet the moment-generating function does not, because the limit

<math display="block">\lim_{n \to \infty} \sum_{i=0}^n \frac{t^i m_i}{i!}</math>

may not exist. The [[log-normal distribution]] is an example of when this occurs.
<!--
If the moment generating function is defined on such an interval, then it uniquely determines a probability distribution. -->

===Calculations of moments===
The moment-generating function is so called because if it exists on an open interval around {{math|1=''t'' = 0}}, then it is the [[exponential generating function]] of the [[moment (mathematics)|moments]] of the [[probability distribution]]:

<math display="block">m_n = \operatorname{E}\left[ X^n \right] = M_X^{(n)}(0) = \left. \frac{d^n M_X}{dt^n}\right|_{t=0}.</math>

That is, with {{mvar|n}} being a nonnegative integer, the {{mvar|n}}-th moment about 0 is the {{mvar|n}}-th derivative of the moment generating function, evaluated at {{math|1=''t'' = 0}}.

==Other properties==
[[Jensen's inequality]] provides a simple lower bound on the moment-generating function:
<math display="block"> M_X(t) \geq e^{\mu t}, </math>
where <math>\mu</math> is the mean of {{mvar|X}}.

The moment-generating function can be used in conjunction with [[Markov's inequality]] to bound the upper tail of a real random variable {{mvar|X}}. This statement is also called the [[Chernoff bound]]. Since <math>x \mapsto e^{xt}</math> is monotonically increasing for <math>t>0</math>, we have
<math display="block"> \Pr(X \ge a) = \Pr(e^{tX} \ge e^{ta}) \le e^{-at} \operatorname{E}\left[e^{tX}\right] = e^{-at}M_X(t)</math>
for any <math>t>0</math> and any {{mvar|a}}, provided <math>M_X(t)</math> exists. For example, when {{mvar|X}} is a standard normal distribution and <math>a > 0</math>, we can choose <math>t=a</math> and recall that <math>M_X(t)=e^{t^2/2}</math>. This gives <math>\Pr(X\ge a)\le e^{-a^2/2}</math>, which is within a factor of {{math|1+''a''}} of the exact value.

Various lemmas, such as [[Hoeffding's lemma]] or [[Bennett's inequality]] provide bounds on the moment-generating function in the case of a zero-mean, bounded random variable.

When <math>X</math> is non-negative, the moment generating function gives a simple, useful bound on the moments:
<math display="block">\operatorname{E}[X^m] \le \left(\frac{m}{te}\right)^m M_X(t),</math>
For any <math>X,m\ge 0</math> and <math>t>0</math>.

This follows from the inequality <math>1+x\le e^x</math> into which we can substitute <math>x'=tx/m-1</math> implies <math>tx/m\le e^{tx/m-1}</math> for any {{nowrap|<math>x, t, m \in \mathbb R</math>.}}
Now, if <math>t > 0</math> and <math>x,m\ge 0</math>, this can be rearranged to <math>x^m \le (m/(te))^m e^{tx}</math>.
Taking the expectation on both sides gives the bound on <math>\operatorname{E}[X^m]</math> in terms of <math>\operatorname{E}[e^{tX}]</math>.

As an example, consider <math>X\sim\text{Chi-Squared}</math> with <math>k</math> degrees of freedom. Then from the [[Moment-generating function#Examples|examples]] <math>M_X(t) = (1-2t)^{-k/2}</math>.
Picking <math>t=m/(2m+k)</math> and substituting into the bound:
<math display="block">\operatorname{E}[X^m] \le {\left(1 + 2m/k\right)}^{k/2} e^{-m} {\left(k + 2m\right)}^m.</math>
We know that [[Chi-square distribution#Noncentral moments|in this case]] the correct bound is <math>\operatorname{E}[X^m]\le 2^m \Gamma(m+k/2)/\Gamma(k/2)</math>.
To compare the bounds, we can consider the asymptotics for large <math>k</math>.
Here the moment-generating function bound is <math>k^m(1+m^2/k + O(1/k^2))</math>,
where the real bound is <math>k^m(1+(m^2-m)/k + O(1/k^2))</math>.
The moment-generating function bound is thus very strong in this case.

==Relation to other functions==
Related to the moment-generating function are a number of other [[integral transform|transforms]] that are common in probability theory:

;[[Characteristic function (probability theory)|Characteristic function]]: The [[characteristic function (probability theory)|characteristic function]] <math>\varphi_X(t)</math> is related to the moment-generating function via <math>\varphi_X(t) = M_{iX}(t) = M_X(it):</math> the characteristic function is the moment-generating function of ''iX'' or the moment generating function of ''X'' evaluated on the imaginary axis.  This function can also be viewed as the [[Fourier transform]] of the [[probability density function]], which can therefore be deduced from it by inverse Fourier transform.
;[[Cumulant-generating function]]: The [[cumulant-generating function]] is defined as the logarithm of the moment-generating function; some instead define the cumulant-generating function as the logarithm of the [[Characteristic function (probability theory)|characteristic function]], while others call this latter the ''second'' cumulant-generating function.
;[[Probability-generating function]]: The [[probability-generating function]] is defined as <math>G(z) = \operatorname{E}\left[z^X\right].</math> This immediately implies that <math>G(e^t) = \operatorname{E}\left[e^{tX}\right] = M_X(t).</math>

==See also==
* [[Characteristic function (probability theory)]]
* [[Factorial moment generating function]]
* [[Rate function]]
* [[Hamburger moment problem]]

{{More footnotes|date=February 2010}}

==References==
===Citations===
{{Reflist}}

===Sources===
{{Refbegin}}
* {{cite book |last1=Casella |first1=George |last2=Berger |first2=Roger |title=Statistical Inference |year=2002 |edition=2nd |isbn = 978-0-534-24312-8 |pages=59–68 |publisher=Thomson Learning }}
{{Refend}}

{{Clear}}
{{Theory of probability distributions}}
{{Authority control}}

{{DEFAULTSORT:Moment-Generating Function}}
[[Category:Moments (mathematics)]]
[[Category:Generating functions]]