Editing Mahalanobis distance (section)

==Definition==
Given a probability distribution <math>Q</math> on <math>\R^N</math>, with mean <math>\vec{\mu} = (\mu_1, \mu_2, \mu_3, \dots , \mu_N)^\mathsf{T}</math> and positive semi-definite [[covariance matrix]] <math>\mathbf{\Sigma}</math>, the Mahalanobis distance of a point <math>\vec{x} = (x_1, x_2, x_3, \dots, x_N )^\mathsf{T}</math> from <math>Q</math> is <ref>{{Cite journal |last1=De Maesschalck |first1=R. |last2=Jouan-Rimbaud |first2=D. |last3=Massart |first3=D.&nbsp;L. |title=The Mahalanobis distance |journal=Chemometrics and Intelligent Laboratory Systems |year=2000 |volume=50 |issue=1 |pages=1–18 |doi=10.1016/s0169-7439(99)00047-7}}</ref><math display="block">d_M(\vec{x}, Q) = \sqrt{(\vec{x} - \vec{\mu})^\mathsf{T} \mathbf{\Sigma}^{-1} (\vec{x} - \vec{\mu})}.</math>Given two points <math>\vec{x}</math> and <math>\vec{y}</math> in <math>\R^N</math>, the Mahalanobis distance between them with respect to <math>Q</math> is<math display="block"> d_M(\vec{x} ,\vec{y}; Q) = \sqrt{(\vec{x} - \vec{y})^\mathsf{T} \mathbf{\Sigma}^{-1} (\vec{x} - \vec{y})}.</math>which means that <math>d_M(\vec{x}, Q) = d_M(\vec{x},\vec{\mu}; Q)</math>.

Since <math>\mathbf{\Sigma}</math> is [[Positive semidefinite matrices|positive semi-definite]], so is <math>\mathbf{\Sigma}^{-1}</math>, thus the square roots are always defined.

We can find useful decompositions of the squared Mahalanobis distance that help to explain some reasons for the outlyingness of multivariate observations and also provide a graphical tool for identifying outliers.<ref>{{Cite journal |last=Kim |first=M.&nbsp;G. |year=2000 |title=Multivariate outliers and decompositions of Mahalanobis distance |journal=Communications in Statistics – Theory and Methods |volume=29 |issue=7 |pages=1511–1526 |doi=10.1080/03610920008832559|s2cid=218567835 }}</ref>

By the [[spectral theorem]], <math>\mathbf{\Sigma}</math> can be decomposed as <math> \mathbf{\Sigma} = \mathbf{S}^T \mathbf{S}</math> for some real <math> N\times N</math> matrix. One choice for <math>\mathbf{S}</math> is the symmetric square root of <math>\mathbf{\Sigma}</math>, which is the [[Standard deviation#Standard deviation matrix|standard deviation matrix]].<ref name="Das">{{cite arXiv |eprint=2012.14331 |last1=Das |first1=Abhranil |author2=Wilson S Geisler |title=Methods to integrate multinormals and compute classification measures |date=2020 |class=stat.ML }}</ref> This gives us the equivalent definition<math display="block">d_M(\vec{x}, \vec{y}; Q) = \|\mathbf{S}^{-1}(\vec{x} - \vec{y})\|</math>where <math>\|\cdot\|</math> is the Euclidean norm. That is, the Mahalanobis distance is the Euclidean distance after a [[whitening transformation]].

The existence of <math>\mathbf{S}</math> is guaranteed by the spectral theorem, but it is not unique. Different choices have different theoretical and practical advantages.<ref>{{Cite journal |last1=Kessy |first1=Agnan |last2=Lewin |first2=Alex |last3=Strimmer |first3=Korbinian |date=2018-10-02 |title=Optimal Whitening and Decorrelation |url=https://doi.org/10.1080/00031305.2016.1277159 |journal=The American Statistician |volume=72 |issue=4 |pages=309–314 |doi=10.1080/00031305.2016.1277159 |s2cid=55075085 |issn=0003-1305|arxiv=1512.00809 }}</ref>

In practice, the distribution <math>Q</math> is usually the [[sample distribution]] from a set of [[Independent and identically distributed random variables|IID]] samples from an underlying unknown distribution, so <math>\mu</math> is the sample mean, and <math>\mathbf{\Sigma}</math> is the covariance matrix of the samples.

When the [[affine span]] of the samples is not the entire <math>\R^N</math>, the covariance matrix would not be positive-definite, which means the above definition would not work. However, in general, the Mahalanobis distance is preserved under any full-rank affine transformation of the affine span of the samples. So in case the affine span is not the entire <math>\R^N</math>, the samples can be first orthogonally projected to <math>\R^n</math>, where <math>n</math> is the dimension of the affine span of the samples, then the Mahalanobis distance can be computed as usual.