Editing Maximum likelihood estimation (section)

== Non-independent variables ==
It may be the case that variables are correlated, or more generally, not independent. Two random variables <math>y_1</math> and <math>y_2</math> are independent only if their joint probability density function is the product of the individual probability density functions, i.e.

<math display="block">f(y_1,y_2) = f(y_1) f(y_2)\,</math>

Suppose one constructs an order-''n'' Gaussian vector out of random variables <math>(y_1,\ldots,y_n)</math>, where each variable has means given by <math>(\mu_1, \ldots, \mu_n)</math>. Furthermore, let the [[covariance matrix]] be denoted by <math>\mathit\Sigma</math>. The joint probability density function of these ''n'' random variables then follows a [[multivariate normal distribution]] given by:

<math display="block">f(y_1,\ldots,y_n) = \frac{1}{(2\pi)^{n/2}\sqrt{\det(\mathit\Sigma)}} \exp\left( -\frac{1}{2} \left[y_1-\mu_1,\ldots,y_n-\mu_n\right]\mathit\Sigma^{-1}     \left[y_1-\mu_1,\ldots,y_n-\mu_n\right]^\mathrm{T} \right)</math>

In the [[Bivariate analysis|bivariate]] case, the joint probability density function is given by:

<math display="block"> f(y_1,y_2) = \frac{1}{2\pi \sigma_{1} \sigma_2 \sqrt{1-\rho^2}} \exp\left[ -\frac{1}{2(1-\rho^2)} \left(\frac{(y_1-\mu_1)^2}{\sigma_1^2} - \frac{2\rho(y_1-\mu_1)(y_2-\mu_2)}{\sigma_1\sigma_2} + \frac{(y_2-\mu_2)^2}{\sigma_2^2}\right) \right] </math>

In this and other cases where a joint density function exists, the likelihood function is defined as above, in the section "[[Maximum likelihood#Principles|principles]]," using this density.

=== Example ===
<math>X_1,\ X_2,\ldots,\ X_m</math> are counts in cells / boxes 1 up to m; each box has a different probability (think of the boxes being bigger or smaller) and we fix the number of balls that fall to be <math>n</math>:<math>x_1+x_2+\cdots+x_m=n</math>. The probability of each box is <math>p_i</math>, with a constraint: <math>p_1+p_2+\cdots+p_m=1</math>. This is a case in which the <math>X_i</math> ''s'' are not independent, the joint probability of a vector <math>x_1,\ x_2,\ldots,x_m</math> is called the multinomial and has the form:

<math display="block">f(x_1,x_2,\ldots,x_m\mid p_1,p_2,\ldots,p_m)=\frac{n!}{\prod x_i!}\prod p_i^{x_i}= \binom{n}{x_1,x_2,\ldots,x_m} p_1^{x_1} p_2^{x_2} \cdots p_m^{x_m}</math>

Each box taken separately against all the other boxes is a binomial and this is an extension thereof.

The log-likelihood of this is:

<math display="block">\ell(p_1,p_2,\ldots,p_m)=\log n!-\sum_{i=1}^m \log x_i!+\sum_{i=1}^m x_i\log p_i</math>

The constraint has to be taken into account and use the Lagrange multipliers:

<math display="block">L(p_1,p_2,\ldots,p_m,\lambda)=\ell(p_1,p_2,\ldots,p_m)+\lambda\left(1-\sum_{i=1}^m p_i\right)</math>

By posing all the derivatives to be 0, the most natural estimate is derived

<math display="block">\hat{p}_i=\frac{x_i}{n}</math>

Maximizing log likelihood, with and without constraints, can be an unsolvable problem in closed form, then we have to use iterative procedures.