Editing Covariance matrix (section)

==Properties==
===Relation to the autocorrelation matrix===
The auto-covariance matrix <math>\operatorname{K}_{\mathbf{X}\mathbf{X}}</math> is related to the [[autocorrelation matrix]] <math>\operatorname{R}_{\mathbf{X}\mathbf{X}}</math> by
<math display="block">\operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}[(\mathbf{X} - \operatorname{E}[\mathbf{X}])(\mathbf{X} - \operatorname{E}[\mathbf{X}])^\mathsf{T}] =  \operatorname{R}_{\mathbf{X}\mathbf{X}} - \operatorname{E}[\mathbf{X}] \operatorname{E}[\mathbf{X}]^\mathsf{T}</math>
where the autocorrelation matrix is defined as <math>\operatorname{R}_{\mathbf{X}\mathbf{X}} = \operatorname{E}[\mathbf{X} \mathbf{X}^\mathsf{T}]</math>.

===Relation to the correlation matrix===
{{further|Correlation matrix}}
An entity closely related to the covariance matrix is the matrix of [[Pearson product-moment correlation coefficient]]s between each of the random variables in the random vector <math>\mathbf{X}</math>, which can be written as
<math display="block">\operatorname{corr}(\mathbf{X}) = \big(\operatorname{diag}(\operatorname{K}_{\mathbf{X}\mathbf{X}})\big)^{-\frac{1}{2}} \, \operatorname{K}_{\mathbf{X}\mathbf{X}} \, \big(\operatorname{diag}(\operatorname{K}_{\mathbf{X}\mathbf{X}})\big)^{-\frac{1}{2}},</math>
where  <math>\operatorname{diag}(\operatorname{K}_{\mathbf{X}\mathbf{X}})</math> is the matrix of the diagonal elements of <math>\operatorname{K}_{\mathbf{X}\mathbf{X}}</math> (i.e., a [[diagonal matrix]] of the variances of <math>X_i</math>  for <math>i = 1, \dots, n</math>).

Equivalently, the correlation matrix can be seen as the covariance matrix of the [[standardized variable|standardized random variables]] <math>X_i/\sigma(X_i)</math> for <math>i = 1, \dots, n</math>.
<math display="block">
\operatorname{corr}(\mathbf{X})
= \begin{bmatrix}
 1 & \frac{\operatorname{E}[(X_1 - \mu_1)(X_2 - \mu_2)]}{\sigma(X_1)\sigma(X_2)} & \cdots & \frac{\operatorname{E}[(X_1 - \mu_1)(X_n - \mu_n)]}{\sigma(X_1)\sigma(X_n)} \\ \\
 \frac{\operatorname{E}[(X_2 - \mu_2)(X_1 - \mu_1)]}{\sigma(X_2)\sigma(X_1)} & 1 & \cdots & \frac{\operatorname{E}[(X_2 - \mu_2)(X_n - \mu_n)]}{\sigma(X_2)\sigma(X_n)} \\ \\
 \vdots & \vdots & \ddots & \vdots \\ \\
 \frac{\operatorname{E}[(X_n - \mu_n)(X_1 - \mu_1)]}{\sigma(X_n)\sigma(X_1)} & \frac{\operatorname{E}[(X_n - \mu_n)(X_2 - \mu_2)]}{\sigma(X_n)\sigma(X_2)} & \cdots & 1
\end{bmatrix}.
</math>

Each element on the principal diagonal of a correlation matrix is the correlation of a random variable with itself, which always equals 1. Each [[off-diagonal element]] is between −1 and +1 inclusive.

===Inverse of the covariance matrix===
The [[invertible matrix|inverse of this matrix]], <math>\operatorname{K}_{\mathbf{X}\mathbf{X}}^{-1}</math>, if it exists, is the inverse covariance matrix (or inverse concentration matrix{{dubious|reason=An inverse concentration gets smaller the more concentrated something is, not larger as it reasonably should be. Besides, we already call it "concentration matrix" later on in this sentence, which is kind of contradictory.|date=September 2024}}), also known as the ''[[precision matrix]]'' (or ''concentration matrix'').<ref>{{cite book |title=All of Statistics: A Concise Course in Statistical Inference |url=https://archive.org/details/springer_10.1007-978-0-387-21736-9 |first=Larry |last=Wasserman |year=2004 |publisher=Springer |isbn=0-387-40272-1}}</ref>

Just as the covariance matrix can be written as the rescaling of a correlation matrix by the marginal variances: 
<math display="block">\operatorname{cov}(\mathbf{X})
= \begin{bmatrix}
 \sigma_{x_1} & & & 0\\
 & \sigma_{x_2}\\
 & &  \ddots\\
 0 & & &  \sigma_{x_n}
\end{bmatrix}
\begin{bmatrix}
 1 & \rho_{x_1, x_2} & \cdots & \rho_{x_1, x_n}\\
 \rho_{x_2, x_1} & 1 & \cdots & \rho_{x_2, x_n}\\
 \vdots & \vdots & \ddots & \vdots\\
 \rho_{x_n, x_1} & \rho_{x_n, x_2} & \cdots & 1\\
\end{bmatrix}
\begin{bmatrix}
 \sigma_{x_1} & & & 0\\
 & \sigma_{x_2}\\
 & &  \ddots\\
 0 & & &  \sigma_{x_n}
\end{bmatrix}</math>

So, using the idea of [[partial correlation]], and partial variance, the inverse covariance matrix can be expressed analogously:
<math display="block">\operatorname{cov}(\mathbf{X})^{-1}
= \begin{bmatrix}
 \frac{1}{\sigma_{x_1|x_2...}} & & & 0\\
 & \frac{1}{\sigma_{x_2|x_1,x_3...}}\\
 & &  \ddots\\
 0 & & &  \frac{1}{\sigma_{x_n|x_1...x_{n-1}}}
\end{bmatrix}
\begin{bmatrix}
 1 & -\rho_{x_1, x_2\mid x_3...} & \cdots & -\rho_{x_1, x_n\mid x_2...x_{n-1}}\\
 -\rho_{x_2, x_1\mid x_3...} & 1 & \cdots & -\rho_{x_2, x_n\mid x_1,x_3...x_{n-1}}\\
 \vdots & \vdots & \ddots & \vdots\\
 -\rho_{x_n, x_1\mid x_2...x_{n-1}} & -\rho_{x_n, x_2\mid x_1,x_3...x_{n-1}} & \cdots & 1\\
\end{bmatrix}
\begin{bmatrix}
 \frac{1}{\sigma_{x_1|x_2...}} & & & 0\\
 & \frac{1}{\sigma_{x_2|x_1,x_3...}}\\
 & &  \ddots\\
 0 & & &  \frac{1}{\sigma_{x_n|x_1...x_{n-1}}}
\end{bmatrix}</math>
This duality motivates a number of other dualities between marginalizing and conditioning for Gaussian random variables.

===Basic properties===
For <math>\operatorname{K}_{\mathbf{X}\mathbf{X}}=\operatorname{var}(\mathbf{X}) = \operatorname{E} \left[ \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right) \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right)^\mathsf{T} \right]</math> and <math> \boldsymbol{\mu}_\mathbf{X} = \operatorname{E}[\textbf{X}]</math>, where <math>\mathbf{X} = (X_1,\ldots,X_n)^\mathsf{T}</math> is an <math>n</math>-dimensional random variable, the following basic properties apply:<ref name=taboga>{{cite web |last1=Taboga |first1=Marco |url=http://www.statlect.com/varian2.htm |title=Lectures on probability theory and mathematical statistics |year=2010}}</ref>
# <math> \operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}(\mathbf{X X^\mathsf{T}}) - \boldsymbol{\mu}_\mathbf{X}\boldsymbol{\mu}_\mathbf{X}^\mathsf{T} </math>
# <math> \operatorname{K}_{\mathbf{X}\mathbf{X}} \,</math> is [[Positive-semidefinite matrix|positive-semidefinite]], i.e. <math>\mathbf{a}^T \operatorname{K}_{\mathbf{X}\mathbf{X}} \mathbf{a} \ge 0 \quad \text{for all } \mathbf{a} \in \mathbb{R}^n</math>
{{Hidden|
  title = ''Proof'' |
  content = 
<!------------------------------------------------------------------------------------->
Indeed, from the property 4 it follows that under linear transformation of random variable <math>\mathbf{X}</math> with covariation matrix <math>\mathbf{\Sigma_{X}} = \mathrm{cov}(\mathbf{X})</math> by linear operator <math>\mathbf{A}</math> s.a. <math>\mathbf{Y} = \mathbf{A}\mathbf{X}</math>, the covariation matrix is tranformed as
: <math>\mathbf{\Sigma_{Y}} = \mathrm{cov}\left(\mathbf{Y}\right) = \mathbf{A\, \Sigma_{X}\,A}^{\top}</math>.
As according to the property 3 matrix <math>\mathbf{\Sigma_{X}}</math> is symmetric, it can be diagonalized by a linear orthogonal transformation, i.e. there exists such orthogonal matrix <math>\mathbf{A}</math> (meanwhile <math>\mathbf{A}^{\top} = \mathbf{A}^{-1}</math>), that
: <math>\mathbf{A\, \Sigma_{X}\,A}^{\top} = \mathbf{A\, \Sigma_{X}\,A}^{-1} = \mbox{diag}(\sigma_1,\ldots,\sigma_n),</math>
and <math>\sigma_1,\ldots,\sigma_n</math> are eigenvalues of <math>\mathbf{\Sigma_{X}}</math>. But this means that this matrix is a covariation matrix for a random variable <math>\mathbf{Y} = \mathbf{A}\mathbf{X}</math>, and the main diagonal of <math>\mathbf{\Sigma_{Y}} = \mathrm{cov}\left(\mathbf{Y}\right)</math> consists of variances of elements of <math>\mathbf{Y}</math> vector. As variance is always non-negative, we conclude that <math>\sigma_i \geq 0</math> for any <math>i</math>. But this means that matrix <math>\mathbf{\Sigma_{X}}</math> is positive-semidefinite.
<!------------------------------------------------------------------------------------->
}}
# <math> \operatorname{K}_{\mathbf{X}\mathbf{X}} \,</math> is [[symmetric matrix|symmetric]], i.e. <math>\operatorname{K}_{\mathbf{X}\mathbf{X}}^\mathsf{T} = \operatorname{K}_{\mathbf{X}\mathbf{X}}</math>
# For any constant (i.e. non-random) <math>m \times n</math> matrix <math>\mathbf{A}</math> and constant <math>m \times 1</math> vector <math>\mathbf{a}</math>, one has <math> \operatorname{var}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{var}(\mathbf{X})\, \mathbf{A}^\mathsf{T} </math>
# If <math>\mathbf{Y}</math> is another random vector with the same dimension as <math>\mathbf{X}</math>, then <math>\operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y}) </math> where <math>\operatorname{cov}(\mathbf{X}, \mathbf{Y})</math> is the [[cross-covariance matrix]] of <math>\mathbf{X}</math> and <math>\mathbf{Y}</math>.

=== Block matrices ===

The joint mean <math>\boldsymbol\mu</math> and [[cross-covariance matrix|joint covariance matrix]] <math>\boldsymbol\Sigma</math> of <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> can be written in block form
<math display="block">
\boldsymbol\mu
=
\begin{bmatrix}
 \boldsymbol{\mu}_X \\
 \boldsymbol{\mu}_Y
\end{bmatrix}, \qquad
\boldsymbol\Sigma
=
\begin{bmatrix}
 \operatorname{K}_\mathbf{XX} & \operatorname{K}_\mathbf{XY} \\
 \operatorname{K}_\mathbf{YX} & \operatorname{K}_\mathbf{YY}
\end{bmatrix}
</math>
where <math> \operatorname{K}_\mathbf{XX} = \operatorname{var}(\mathbf{X}) </math>, <math> \operatorname{K}_\mathbf{YY} = \operatorname{var}(\mathbf{Y}) </math> and <math> \operatorname{K}_\mathbf{XY} = \operatorname{K}^\mathsf{T}_\mathbf{YX} = \operatorname{cov}(\mathbf{X}, \mathbf{Y}) </math>.

<math> \operatorname{K}_\mathbf{XX} </math> and <math> \operatorname{K}_\mathbf{YY} </math> can be identified as the variance matrices of the [[marginal distribution]]s for <math> \mathbf{X} </math> and <math> \mathbf{Y} </math> respectively.

If <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> are [[Multivariate normal distribution|jointly normally distributed]],
<math display="block">
\mathbf{X}, \mathbf{Y} \sim\ \mathcal{N}(\boldsymbol\mu, \operatorname{\boldsymbol\Sigma}),
</math>
then the [[conditional distribution]] for <math>\mathbf{Y}</math> given <math>\mathbf{X}</math> is given by<ref name=eaton>{{cite book|last=Eaton|first=Morris L.|title=Multivariate Statistics: a Vector Space Approach|year=1983|publisher=John Wiley and Sons|isbn=0-471-02776-6|pages=116–117}}</ref>
<math display="block">
\mathbf{Y} \mid \mathbf{X} \sim\ \mathcal{N}(\boldsymbol{\mu}_\mathbf{Y|X}, \operatorname{K}_\mathbf{Y|X}),
</math>
defined by [[conditional mean]]
<math display="block">
\boldsymbol{\mu}_{\mathbf{Y}|\mathbf{X}}
=
\boldsymbol{\mu}_\mathbf{Y} + \operatorname{K}_\mathbf{YX} \operatorname{K}_\mathbf{XX}^{-1}
\left(
 \mathbf{X} - \boldsymbol{\mu}_\mathbf{X}
\right)
</math>
and [[conditional variance]]
<math display="block">
  \operatorname{K}_\mathbf{Y|X}
  = \operatorname{K}_\mathbf{YY} - \operatorname{K}_\mathbf{YX} \operatorname{K}_\mathbf{XX}^{-1} \operatorname{K}_\mathbf{XY}.
</math>

The matrix <math> \operatorname{K}_\mathbf{YX} \operatorname{K}_\mathbf{XX}^{-1} </math> is known as the matrix of [[regression analysis|regression]] coefficients, while in linear algebra <math> \operatorname{K}_\mathbf{Y|X} </math> is the [[Schur complement]] of <math> \operatorname{K}_\mathbf{XX} </math> in <math> \boldsymbol\Sigma </math>.

The matrix of regression coefficients may often be given in transpose form, <math> \operatorname{K}_\mathbf{XX}^{-1} \operatorname{K}_\mathbf{XY} </math>, suitable for post-multiplying a row vector of explanatory variables <math> \mathbf{X}^\mathsf{T} </math> rather than pre-multiplying a column vector <math> \mathbf{X} </math>. In this form they correspond to the coefficients obtained by inverting the matrix of the [[normal equations]] of [[ordinary least squares]] (OLS).