Editing Joint probability distribution (section)

==Joint density function or mass function==

===Discrete case===
The joint [[probability mass function]] of two [[discrete random variable]]s <math>X, Y</math> is:

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>p_{X,Y}(x,y) = \mathrm{P}(X=x\ \mathrm{and}\ Y=y)</math>|{{EquationRef|Eq.3}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

or written in terms of conditional distributions
:<math>p_{X,Y}(x,y) = \mathrm{P}(Y=y \mid X=x) \cdot \mathrm{P}(X=x) = \mathrm{P}(X=x \mid Y=y) \cdot \mathrm{P}(Y=y)</math>
where <math> \mathrm{P}(Y=y \mid X=x) </math> is the [[conditional probability|probability]] of <math> Y = y </math> given that <math> X = x </math>.

The generalization of the preceding two-variable case is the joint probability distribution of <math>n\,</math> discrete random variables <math>X_1, X_2, \dots,X_n</math> which is:

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>p_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \mathrm{P}(X_1=x_1\text{ and }\dots\text{ and }X_n=x_n)</math>|{{EquationRef|Eq.4}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

or equivalently

:<math>
\begin{align}
p_{X_1,\ldots,X_n}(x_1,\ldots,x_n) & =  \mathrm{P}(X_1=x_1) \cdot \mathrm{P}(X_2=x_2\mid X_1=x_1) \\ & \cdot \mathrm{P}(X_3=x_3\mid X_1=x_1,X_2=x_2)  \\ &  \dots \\  & \cdot P(X_n=x_n\mid X_1=x_1,X_2=x_2,\dots,X_{n-1}=x_{n-1}).
\end{align}
</math>.

This identity is known as the [[Chain rule (probability)|chain rule of probability]].

Since these are probabilities, in the two-variable case

:<math>\sum_i \sum_j \mathrm{P}(X=x_i\ \mathrm{and}\ Y=y_j) = 1,\,</math>
which generalizes for <math>n\,</math> discrete random variables <math>X_1, X_2, \dots , X_n</math> to

:<math>\sum_{i} \sum_{j} \dots \sum_{k} \mathrm{P}(X_1=x_{1i},X_2=x_{2j}, \dots, X_n=x_{nk}) = 1.\;</math>

===Continuous case===

The '''joint [[probability density function]]''' <math>f_{X,Y}(x,y)</math> for two [[continuous random variable]]s is defined as the derivative of the joint cumulative distribution function (see {{EquationNote|Eq.1}}):

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>f_{X,Y}(x,y) = \frac{\partial^2 F_{X,Y}(x,y)}{\partial x \partial y}</math>|{{EquationRef|Eq.5}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

This is equal to:
:<math>f_{X,Y}(x,y) = f_{Y\mid X}(y\mid x)f_X(x) = f_{X\mid Y}(x\mid y)f_Y(y)</math>

where <math>f_{Y\mid X}(y\mid x)</math> and <math>f_{X\mid Y}(x\mid y)</math> are the [[conditional distribution]]s of <math>Y</math> given <math>X=x</math> and of <math>X</math> given <math>Y=y</math> respectively, and <math>f_X(x)</math> and <math>f_Y(y)</math> are the [[marginal distribution]]s for <math>X</math> and <math>Y</math> respectively.

The definition extends naturally to more than two random variables:

{{Equation box 1
|indent =
|title=
|equation = {{NumBlk||<math>f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = \frac{\partial^n F_{X_1,\ldots,X_n}(x_1,\ldots,x_n)}{\partial x_1 \ldots \partial x_n}</math>|{{EquationRef|Eq.6}}}}
|cellpadding= 6
|border
|border colour = #0073CF
|background colour=#F5FFFA}}

Again, since these are probability distributions, one has
:<math>\int_x \int_y f_{X,Y}(x,y) \; dy \; dx= 1</math>
respectively
:<math>\int_{x_1} \ldots \int_{x_n} f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) \; dx_n \ldots \; dx_1 = 1</math>

===Mixed case===
The "mixed joint density" may be defined where one or more random variables are continuous and the other random variables are discrete. With one variable of each type
:<math>
\begin{align}
f_{X,Y}(x,y) = f_{X \mid Y}(x \mid y)\mathrm{P}(Y=y)= \mathrm{P}(Y=y \mid X=x) f_X(x).
\end{align}
</math>
One example of a situation in which one may wish to find the cumulative distribution of one random variable which is continuous and another random variable which is discrete arises when one wishes to use a [[logistic regression]] in predicting the probability of a binary outcome Y conditional on the value of a continuously distributed outcome <math>X</math>. One ''must'' use the "mixed" joint density when finding the cumulative distribution of this binary outcome because the input variables <math>(X,Y)</math> were initially defined in such a way that one could not collectively assign it either a probability density function or a probability mass function.  Formally, <math>f_{X,Y}(x,y)</math> is the probability density function of <math>(X,Y)</math> with respect to the [[product measure]] on the respective [[support (measure theory)|supports]] of <math>X</math> and <math>Y</math>. Either of these two decompositions can then be used to recover the joint cumulative distribution function:
:<math>
\begin{align}
F_{X,Y}(x,y)&=\sum\limits_{t\le y}\int_{s=-\infty}^x f_{X,Y}(s,t)\;ds.
\end{align}
</math>
The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.