Editing Matrix normal distribution

{{short description|Probability distribution}}
{{Probability distribution
| name       =Matrix normal
| type       =density
| box_width  =195px
| pdf_image  =
| cdf_image  =
| notation   =<math>\mathcal{MN}_{n,p}(\mathbf{M}, \mathbf{U}, \mathbf{V})</math>
| parameters =<math>\mathbf{M}</math> [[location parameter|location]] ([[real number|real]] <math>n\times p</math> [[matrix (mathematics)|matrix]])<br/>
<math>\mathbf{U}</math> [[scale matrix|scale]] ([[positive-definite matrix|positive-definite]] [[real number|real]] <math>n\times n</math> [[matrix (mathematics)|matrix]])<br/>
<math>\mathbf{V}</math> [[scale matrix|scale]] ([[positive-definite matrix|positive-definite]] [[real number|real]] <math>p\times p</math> [[matrix (mathematics)|matrix]])
| support    =<math>\mathbf{X} \in \mathbb{R}^{n \times p}</math>
| pdf        =<math>\frac{\exp\left( -\frac{1}{2} \, \mathrm{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right] \right)}{(2\pi)^{np/2} |\mathbf{V}|^{n/2} |\mathbf{U}|^{p/2}}</math>
| cdf        =
| mean       =<math>\mathbf{M}</math>
| median     =
| mode       =
| variance   =<math>\mathbf{U}</math> (among-row) and <math>\mathbf{V}</math> (among-column)
| skewness   =
| kurtosis   =
| entropy    =
| mgf        =
| char       =
}}

In [[statistics]], the '''matrix normal distribution''' or '''matrix Gaussian distribution''' is a [[probability distribution]] that is a generalization of the [[multivariate normal distribution]] to matrix-valued random variables.

== Definition ==
The [[probability density function]] for the random matrix '''X''' (''n''&nbsp;&times;&nbsp;''p'') that  follows the matrix normal distribution <math>\mathcal{MN}_{n,p}(\mathbf{M}, \mathbf{U}, \mathbf{V})</math> has the form:

:<math>
p(\mathbf{X}\mid\mathbf{M}, \mathbf{U}, \mathbf{V}) = \frac{\exp\left( -\frac{1}{2} \, \mathrm{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right] \right)}{(2\pi)^{np/2} |\mathbf{V}|^{n/2} |\mathbf{U}|^{p/2}}
</math>

where <math>\mathrm{tr}</math> denotes [[Trace (linear algebra)|trace]] and '''M''' is ''n''&nbsp;&times;&nbsp;''p'', '''U''' is ''n''&nbsp;&times;&nbsp;''n'' and '''V''' is ''p''&nbsp;&times;&nbsp;''p'', and the density is understood as the probability density function with respect to the standard Lebesgue measure in <math>\mathbb{R}^{n\times p}</math>, i.e.: the measure corresponding to integration with respect to <math>dx_{11} dx_{21}\dots dx_{n1} dx_{12}\dots dx_{n2}\dots dx_{np}</math>.

The matrix normal is related to the [[multivariate normal distribution]] in the following way:

:<math>\mathbf{X} \sim \mathcal{MN}_{n\times p}(\mathbf{M}, \mathbf{U}, \mathbf{V}),</math>

if and only if

:<math>\mathrm{vec}(\mathbf{X}) \sim \mathcal{N}_{np}(\mathrm{vec}(\mathbf{M}), \mathbf{V} \otimes \mathbf{U})</math>

where <math>\otimes</math> denotes the [[Kronecker product]] and <math>\mathrm{vec}(\mathbf{M})</math> denotes the [[vectorization (mathematics)|vectorization]] of <math>\mathbf{M}</math>.

===Proof===
The equivalence between the above ''matrix normal'' and ''multivariate normal'' density functions can be shown using several properties of the [[Trace (linear algebra)|trace]] and [[Kronecker product]], as follows. We start with the argument of the exponent of the matrix normal PDF:  
:<math>\begin{align}
&\;\;\;\;-\frac12\text{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right]\\

&= -\frac12\text{vec}\left(\mathbf{X} - \mathbf{M}\right)^T
\text{vec}\left(\mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \mathbf{V}^{-1}\right) \\

&= -\frac12\text{vec}\left(\mathbf{X} - \mathbf{M}\right)^T
\left(\mathbf{V}^{-1}\otimes\mathbf{U}^{-1}\right)\text{vec}\left(\mathbf{X} - \mathbf{M}\right) \\

&= -\frac12\left[\text{vec}(\mathbf{X}) - \text{vec}(\mathbf{M})\right]^T
\left(\mathbf{V}\otimes\mathbf{U}\right)^{-1}\left[\text{vec}(\mathbf{X}) - \text{vec}(\mathbf{M})\right] 
\end{align}</math>
which is the argument of the exponent of the multivariate normal PDF with respect to Lebesgue measure in <math>\mathbb{R}^{n p}</math>. The proof is completed by using the determinant property: <math> |\mathbf{V}\otimes \mathbf{U}| = |\mathbf{V}|^n |\mathbf{U}|^p.</math>

==Properties==
If <math>\mathbf{X} \sim \mathcal{MN}_{n\times p}(\mathbf{M}, \mathbf{U}, \mathbf{V})</math>, then we have the following properties:<ref name="GuptaNagar1999">{{cite book|author1=A K Gupta|author2=D K Nagar|title=Matrix Variate Distributions|url=https://books.google.com/books?id=PQOYnT7P1loC|access-date=23 May 2014|date=22 October 1999|publisher=CRC Press|isbn=978-1-58488-046-2|chapter=Chapter 2: MATRIX VARIATE NORMAL DISTRIBUTION}}</ref><ref>{{cite journal|last=Ding|first=Shanshan|author2=R. Dennis Cook|title=Dimension folding PCA and PFC for matrix-valued predictors|journal=Statistica Sinica|date=2014|volume=24|issue=1|pages=463–492|jstor=26432553}}</ref>

===Expected values===
The mean, or [[expected value]] is:
:<math>E[\mathbf{X}] = \mathbf{M}</math>
and we have the following second-order expectations:
:<math>E[(\mathbf{X} - \mathbf{M})(\mathbf{X} - \mathbf{M})^{T}]
= \mathbf{U}\operatorname{tr}(\mathbf{V}) 
</math>

:<math>E[(\mathbf{X} - \mathbf{M})^{T} (\mathbf{X} - \mathbf{M})]
= \mathbf{V}\operatorname{tr}(\mathbf{U}) 
</math>
where <math>\operatorname{tr}</math> denotes [[Trace (linear algebra)|trace]].

More generally, for appropriately dimensioned matrices '''A''','''B''','''C''':
:<math>\begin{align}
E[\mathbf{X}\mathbf{A}\mathbf{X}^{T}]
&= \mathbf{U}\operatorname{tr}(\mathbf{A}^T\mathbf{V}) + \mathbf{MAM}^T\\

E[\mathbf{X}^T\mathbf{B}\mathbf{X}]
&= \mathbf{V}\operatorname{tr}(\mathbf{U}\mathbf{B}^T) + \mathbf{M}^T\mathbf{BM}\\

E[\mathbf{X}\mathbf{C}\mathbf{X}]
&= \mathbf{V}\mathbf{C}^T\mathbf{U} + \mathbf{MCM}
\end{align}</math>

===Transformation===
[[Transpose]] transform:

:<math>\mathbf{X}^T \sim \mathcal{MN}_{p\times n}(\mathbf{M}^T, \mathbf{V}, \mathbf{U})
</math>

Linear transform: let '''D''' (''r''-by-''n''), be of full [[Rank (linear algebra)|rank]] ''r ≤ n'' and '''C''' (''p''-by-''s''), be of full rank ''s ≤ p'', then:

:<math>\mathbf{DXC}\sim \mathcal{MN}_{r\times s}(\mathbf{DMC}, \mathbf{DUD}^T, \mathbf{C}^T\mathbf{VC})
</math>

===Composition===
The product of two matrix normal distributions
:<math> \mathcal{MN}(\mathbf{M_1}, \mathbf{U_1}, \mathbf{V_1})\cdot \mathcal{MN}(\mathbf{M_2}, \mathbf{U_2}, \mathbf{V_2}) \propto \mathcal{N}(\mu_c, \Sigma_c)
</math>
is proportional to a normal distribution with parameters:
:<math> \Sigma_c = (V_1^{-1} \otimes U_1^{-1} + V_2^{-1} \otimes U_2^{-1})^{-1},
</math>
:<math> \mu_c = \Sigma_c \big((V_1^{-1} \otimes U_1^{-1}) \operatorname{vec}(M_1) + (V_2^{-1} \otimes U_2^{-1})\operatorname{vec}(M_2)\big).
</math>

==Example==
Let's imagine a sample of ''n'' independent ''p''-dimensional random variables identically distributed according to a [[multivariate normal distribution]]:
:<math>\mathbf{Y}_i \sim \mathcal{N}_p({\boldsymbol \mu}, {\boldsymbol \Sigma}) \text{ with } i \in \{1,\ldots,n\}</math>.
When defining the ''n''&nbsp;&times;&nbsp;''p'' matrix <math>\mathbf{X}</math> for which the ''i''th row is <math>\mathbf{Y}_i</math>, we obtain:
:<math>\mathbf{X} \sim \mathcal{MN}_{n \times p}(\mathbf{M}, \mathbf{U}, \mathbf{V})</math>
where each row of <math>\mathbf{M}</math> is equal to <math>{\boldsymbol \mu}</math>, that is <math>\mathbf{M}=\mathbf{1}_n \times {\boldsymbol \mu}^T</math>, <math>\mathbf{U}</math> is the ''n''&nbsp;&times;&nbsp;''n'' identity matrix, that is the rows are independent, and <math>\mathbf{V} = {\boldsymbol \Sigma}</math>.

==Maximum likelihood parameter estimation==
Given ''k''  matrices, each of size ''n''&nbsp;×&nbsp;''p'', denoted <math>\mathbf{X}_1, \mathbf{X}_2, \ldots, \mathbf{X}_k</math>, which we assume have been sampled [[Iid|i.i.d.]] from a matrix normal distribution, the [[maximum likelihood estimate]] of the parameters can be obtained by maximizing:
:<math>
\prod_{i=1}^k \mathcal{MN}_{n\times p}(\mathbf{X}_i\mid\mathbf{M},\mathbf{U},\mathbf{V}).
</math>
The solution for the mean has a closed form, namely
:<math>
\mathbf{M} = \frac{1}{k} \sum_{i=1}^k\mathbf{X}_i
</math>
but the covariance parameters do not. However, these parameters can be iteratively maximized by zero-ing their gradients at: 
:<math>
\mathbf{U} = \frac{1}{kp} \sum_{i=1}^k(\mathbf{X}_i-\mathbf{M})\mathbf{V}^{-1}(\mathbf{X}_i-\mathbf{M})^T
</math>
and
:<math>
\mathbf{V} = \frac{1}{kn} \sum_{i=1}^k(\mathbf{X}_i-\mathbf{M})^T\mathbf{U}^{-1}(\mathbf{X}_i-\mathbf{M}),
</math>
See for example <ref>{{cite arXiv| last1=Glanz|first1=Hunter |last2=Carvalho|first2=Luis |title=An Expectation-Maximization Algorithm for the Matrix Normal Distribution |year=2013 |class=stat.ME |eprint=1309.6609}}</ref> and references therein. The covariance parameters are non-identifiable in the sense that for any scale factor, ''s''>0, we have:
:<math>
\mathcal{MN}_{n\times p}(\mathbf{X}\mid\mathbf{M},\mathbf{U},\mathbf{V}) = \mathcal{MN}_{n\times p}(\mathbf{X}\mid\mathbf{M},s\mathbf{U},\tfrac{1}{s}\mathbf{V}) .
</math>

==Drawing values from the distribution==
Sampling from the matrix normal distribution is a special case of the sampling procedure for the [[multivariate normal distribution]]. Let <math>\mathbf{X}</math> be an ''n'' by ''p'' matrix of ''np'' independent samples from the standard normal distribution, so that
:<math>
\mathbf{X}\sim\mathcal{MN}_{n\times p}(\mathbf{0},\mathbf{I},\mathbf{I}).
</math> 
Then let 
:<math>
\mathbf{Y}=\mathbf{M}+\mathbf{A}\mathbf{X}\mathbf{B},
</math> 
so that
:<math>
\mathbf{Y}\sim\mathcal{MN}_{n\times p}(\mathbf{M},\mathbf{AA}^T,\mathbf{B}^T\mathbf{B}),
</math> 
where '''A''' and '''B''' can be chosen by [[Cholesky decomposition]] or a similar matrix square root operation.

==Relation to other distributions==
Dawid (1981) provides a discussion of the relation of the matrix-valued normal distribution to other distributions, including the [[Wishart distribution]], [[inverse-Wishart distribution]] and [[matrix t-distribution]], but uses different notation from that employed here.

== See also ==
* [[Multivariate normal distribution]]

==References==
{{reflist}}
* {{cite journal
 |last=Dawid |first=A.P. |author-link=Philip Dawid
 |year=1981
 |title=Some matrix-variate distribution theory: Notational considerations and a Bayesian application
 |journal=[[Biometrika]]
 |volume=68 |issue=1 |pages=265&ndash;274
 |doi=10.1093/biomet/68.1.265  |mr=614963 | jstor = 2335827
}}
* {{cite journal
 |last=Dutilleul |first=P
 |year=1999
 |title=The MLE algorithm for the matrix normal distribution
 |journal=[[Journal of Statistical Computation and Simulation]]
 |volume=64 |issue=2 |pages=105&ndash;123
 |doi=10.1080/00949659908811970
}}
* {{Citation
 |last=Arnold |first=S.F.
 |title=The theory of linear models and multivariate analysis
 |publisher=[[John Wiley & Sons]]
 |place=New York
 |year=1981
 |isbn=0471050652
}}

{{ProbDistributions|multivariate}}

[[Category:Random matrices]]
[[Category:Continuous distributions]]
[[Category:Multivariate continuous distributions]]