Editing Probability density function (section)

==Function of random variables and change of variables in the probability density function==

If the probability density function of a random variable (or vector) {{math|''X''}} is given as {{math|''f<sub>X</sub>''(''x'')}}, it is possible (but often not necessary; see below) to calculate the probability density function of some variable {{math|1=''Y'' = ''g''(''X'')}}. This is also called a "change of variable" and is in practice used to generate a random variable of arbitrary shape {{math|1=''f''<sub>''g''(''X'')</sub> = ''f<sub>Y</sub>''}} using a known (for instance, uniform) random number generator.

It is tempting to think that in order to find the expected value {{math|E(''g''(''X''))}}, one must first find the probability density {{math|''f''<sub>''g''(''X'')</sub>}} of the new random variable {{math|1=''Y'' = ''g''(''X'')}}.  However, rather than computing
<math display="block">\operatorname E\big(g(X)\big) = \int_{-\infty}^\infty y f_{g(X)}(y)\,dy, </math>
one may find instead
<math display="block">\operatorname E\big(g(X)\big) = \int_{-\infty}^\infty g(x) f_X(x)\,dx.</math>

The values of the two integrals are the same in all cases in which both {{math|''X''}} and {{math|''g''(''X'')}} actually have probability density functions.  It is not necessary that {{math|''g''}} be a [[one-to-one function]].  In some cases the latter integral is computed much more easily than the former. See [[Law of the unconscious statistician]].

===Scalar to scalar===

Let <math> g: \Reals \to \Reals</math> be a [[monotonic function]], then the resulting density function is<ref>{{cite web |last1=Siegrist |first1=Kyle |title=Transformations of Random Variables |date=5 May 2020 |url=https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_%28Siegrist%29/03%3A_Distributions/3.07%3A_Transformations_of_Random_Variables#The_Change_of_Variables_Formula |publisher=LibreTexts Statistics |access-date=22 December 2023}}</ref>
<math display="block">f_Y(y) = f_X\big(g^{-1}(y)\big) \left| \frac{d}{dy} \big(g^{-1}(y)\big) \right|.</math>

Here {{math|''g''<sup>−1</sup>}} denotes the [[inverse function]].

This follows from the fact that the probability contained in a differential area must be invariant under change of variables. That is,
<math display="block">\left| f_Y(y)\, dy \right| = \left| f_X(x)\, dx \right|,</math>
or
<math display="block">f_Y(y) = \left| \frac{dx}{dy} \right| f_X(x)
= \left| \frac{d}{dy} (x) \right| f_X(x)
= \left| \frac{d}{dy} \big(g^{-1}(y)\big) \right| f_X\big(g^{-1}(y)\big)
= {\left|\left(g^{-1}\right)'(y)\right|} \cdot f_X\big(g^{-1}(y)\big) .</math>

For functions that are not monotonic, the probability density function for {{mvar|y}} is
<math display="block">\sum_{k=1}^{n(y)} \left| \frac{d}{dy} g^{-1}_{k}(y) \right| \cdot f_X\big(g^{-1}_{k}(y)\big),</math>
where {{math|''n''(''y'')}} is the number of solutions in {{mvar|x}} for the equation <math>g(x) = y</math>, and <math>g_k^{-1}(y)</math> are these solutions.

===Vector to vector===

Suppose {{math|'''x'''}} is an {{mvar|n}}-dimensional random variable with joint density {{math|''f''}}. If {{math|1='''''y''''' = ''G''('''''x''''')}}, where {{math|''G''}} is a [[bijective]], [[differentiable function]], then {{math|'''''y'''''}} has density {{math|{{ math | ''p''<sub>'''''Y'''''</sub>}}}}:
<math display="block"> p_{Y}(\mathbf{y}) = f\Bigl(G^{-1}(\mathbf{y})\Bigr) \left| \det\left[\left.\frac{dG^{-1}(\mathbf{z})}{d\mathbf{z}}\right|_{\mathbf{z}=\mathbf{y}}\right] \right|</math>
with the differential regarded as the [[Jacobian matrix and determinant|Jacobian]] of the inverse of {{math|''G''(⋅)}}, evaluated at {{math|'''''y'''''}}.<ref>{{cite book |first1=Jay L. |last1=Devore |first2=Kenneth N. |last2=Berk |title=Modern Mathematical Statistics with Applications |publisher=Cengage |year=2007 |isbn=978-0-534-40473-4 |page=263 |url=https://books.google.com/books?id=3X7Qca6CcfkC&pg=PA263 }}</ref>

For example, in the 2-dimensional case {{math|1='''x''' = (''x''<sub>1</sub>, ''x''<sub>2</sub>)}}, suppose the transform {{math|''G''}} is given as {{math|1=''y''<sub>1</sub> = ''G''<sub>1</sub>(''x''<sub>1</sub>, ''x''<sub>2</sub>)}}, {{math|1=''y''<sub>2</sub> = ''G''<sub>2</sub>(''x''<sub>1</sub>, ''x''<sub>2</sub>)}} with inverses {{math|1=''x''<sub>1</sub> = ''G''<sub>1</sub><sup>−1</sup>(''y''<sub>1</sub>, ''y''<sub>2</sub>)}}, {{math|1=''x''<sub>2</sub> = ''G''<sub>2</sub><sup>−1</sup>(''y''<sub>1</sub>, ''y''<sub>2</sub>)}}.  The joint distribution for '''y'''&nbsp;= (''y''<sub>1</sub>,&nbsp;y<sub>2</sub>) has density<ref>{{Cite book |title=Elementary Probability |last=David |first=Stirzaker |date=2007-01-01 |publisher=Cambridge University Press |isbn=978-0521534284 |oclc=851313783}}</ref>
<math display="block">p_{Y_1, Y_2}(y_1,y_2) = f_{X_1,X_2}\big(G_1^{-1}(y_1,y_2), G_2^{-1}(y_1,y_2)\big) \left\vert \frac{\partial G_1^{-1}}{\partial y_1} \frac{\partial G_2^{-1}}{\partial y_2} - \frac{\partial G_1^{-1}}{\partial y_2} \frac{\partial G_2^{-1}}{\partial y_1} \right\vert.</math>

===Vector to scalar===

Let <math> V: \R^n \to \R </math> be a differentiable function and <math> X </math> be a random vector taking values in <math> \R^n </math>, <math> f_X </math> be the probability density function of <math> X </math> and <math> \delta(\cdot) </math>  be the [[Dirac delta]] function. It is possible to use the formulas above to determine <math> f_Y </math>, the probability density function of <math> Y = V(X) </math>, which will be given by
<math display="block">f_Y(y) = \int_{\R^n} f_{X}(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) \,d \mathbf{x}.</math>

This result leads to the [[law of the unconscious statistician]]:
<math display="block">\begin{align}
\operatorname{E}_Y[Y] &=\int_{\R} y f_Y(y) \, dy \\
&= \int_{\R} y \int_{\R^n} f_X(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) \,d \mathbf{x} \,dy \\
&= \int_{{\mathbb R}^n} \int_{\mathbb R} y f_{X}(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) \, dy \, d \mathbf{x} \\
&= \int_{\mathbb R^n} V(\mathbf{x}) f_X(\mathbf{x}) \, d \mathbf{x}=\operatorname{E}_X[V(X)].
\end{align}</math>

''Proof:''

Let <math>Z</math> be a collapsed random variable with probability density function <math>p_Z(z) = \delta(z)</math> (i.e., a constant equal to zero). Let the random vector <math>\tilde{X}</math> and the transform <math>H</math> be defined as
<math display="block">H(Z,X)=\begin{bmatrix} Z+V(X)\\ X\end{bmatrix}=\begin{bmatrix} Y\\ \tilde{X}\end{bmatrix}.</math>

It is clear that <math>H</math> is a bijective mapping, and the Jacobian of <math>H^{-1}</math> is given by:
<math display="block">\frac{dH^{-1}(y,\tilde{\mathbf{x}})}{dy\,d\tilde{\mathbf{x}}}=\begin{bmatrix} 1 & -\frac{dV(\tilde{\mathbf{x}})}{d\tilde{\mathbf{x}}}\\ \mathbf{0}_{n\times1} & \mathbf{I}_{n\times n} \end{bmatrix},</math>
which is an upper triangular matrix with ones on the main diagonal, therefore its determinant is 1. Applying the change of variable theorem from the previous section we obtain that
<math display="block">f_{Y,X}(y,x) = f_X(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big),</math>
which if marginalized over <math>x</math> leads to the desired probability density function.