Editing Linear discriminant analysis (section)

==Practical use==
In practice, the class means and covariances are not known. They can, however, be estimated from the training set. Either the [[maximum likelihood estimation|maximum likelihood estimate]] or the [[maximum a posteriori]] estimate may be used in place of the exact value in the above equations. Although the estimates of the covariance may be considered optimal in some sense, this does not mean that the resulting discriminant obtained by substituting these values is optimal in any sense, even if the assumption of normally distributed classes is correct.

Another complication in applying LDA and Fisher's discriminant to real data occurs when the number of measurements of each sample (i.e., the dimensionality of each data vector) exceeds the number of samples in each class.<ref name="Martinez:2001" /> In this case, the covariance estimates do not have full rank, and so cannot be inverted. There are a number of ways to deal with this. One is to use a [[pseudo inverse]] instead of the usual matrix inverse in the above formulae. However, better numeric stability may be achieved by first projecting the problem onto the subspace spanned by <math> \Sigma_b </math>.<ref>{{cite journal | last1 = Yu | first1 = H. | last2 = Yang | first2 = J. | year = 2001 | title = A direct LDA algorithm for high-dimensional data — with application to face recognition | journal = Pattern Recognition | volume = 34 | issue = 10| pages = 2067–2069 | doi=10.1016/s0031-3203(00)00162-x| bibcode = 2001PatRe..34.2067Y | citeseerx = 10.1.1.70.3507 }}</ref>
Another strategy to deal with small sample size is to use a [[shrinkage estimator]] of the covariance matrix, which
can be expressed mathematically as

:<math> \Sigma = (1-\lambda) \Sigma+\lambda I\,</math><!-- TeX -->

where <math> I </math> is the identity matrix, and <math> \lambda </math> is the ''shrinkage intensity'' or ''regularisation parameter''.
This leads to the framework of regularized discriminant analysis<ref name="Friedman:2001">{{cite journal |last=Friedman |first=J. H. |title=Regularized Discriminant Analysis |journal=[[Journal of the American Statistical Association]] |volume=84 |issue=405 |pages=165–175 |year=1989 |url=http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-4389.pdf |doi=10.2307/2289860 |jstor=2289860 |mr=0999675|citeseerx=10.1.1.382.2682 }}</ref> or shrinkage discriminant analysis.<ref>{{cite journal | last1 = Ahdesmäki | first1 = M. | last2 = Strimmer | first2 = K. | year = 2010 | title = Feature selection in omics prediction problems using cat scores and false nondiscovery rate control | journal = Annals of Applied Statistics | volume = 4 | issue = 1| pages = 503–519 | doi=10.1214/09-aoas277| arxiv = 0903.2003 | s2cid = 2508935 }}</ref>

Also, in many practical cases linear discriminants are not suitable. LDA and Fisher's discriminant can be extended for use in non-linear classification via the [[kernel trick]]. Here, the original observations are effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space. The most commonly used example of this is the [[Kernel Fisher discriminant analysis|kernel Fisher discriminant]].

LDA can be generalized to [[multiple discriminant analysis]], where ''c'' becomes a [[categorical variable]] with ''N'' possible states, instead of only two. Analogously, if the class-conditional densities <math>p(\vec x\mid c=i)</math> are normal with shared covariances, the [[sufficient statistic]] for <math>P(c\mid\vec x)</math> are the values of ''N'' projections, which are the [[Linear subspace|subspace]] spanned by the ''N'' means, [[affine transformation|affine projected]] by the inverse covariance matrix. These projections can be found by solving a [[Eigendecomposition of a matrix#Generalized eigenvalue problem|generalized eigenvalue problem]], where the numerator is the covariance matrix formed by treating the means as the samples, and the denominator is the shared covariance matrix. See “[[#Multiclass LDA|Multiclass LDA]]” above for details.