Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Linear discriminant analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Practical use== In practice, the class means and covariances are not known. They can, however, be estimated from the training set. Either the [[maximum likelihood estimation|maximum likelihood estimate]] or the [[maximum a posteriori]] estimate may be used in place of the exact value in the above equations. Although the estimates of the covariance may be considered optimal in some sense, this does not mean that the resulting discriminant obtained by substituting these values is optimal in any sense, even if the assumption of normally distributed classes is correct. Another complication in applying LDA and Fisher's discriminant to real data occurs when the number of measurements of each sample (i.e., the dimensionality of each data vector) exceeds the number of samples in each class.<ref name="Martinez:2001" /> In this case, the covariance estimates do not have full rank, and so cannot be inverted. There are a number of ways to deal with this. One is to use a [[pseudo inverse]] instead of the usual matrix inverse in the above formulae. However, better numeric stability may be achieved by first projecting the problem onto the subspace spanned by <math> \Sigma_b </math>.<ref>{{cite journal | last1 = Yu | first1 = H. | last2 = Yang | first2 = J. | year = 2001 | title = A direct LDA algorithm for high-dimensional data β with application to face recognition | journal = Pattern Recognition | volume = 34 | issue = 10| pages = 2067β2069 | doi=10.1016/s0031-3203(00)00162-x| bibcode = 2001PatRe..34.2067Y | citeseerx = 10.1.1.70.3507 }}</ref> Another strategy to deal with small sample size is to use a [[shrinkage estimator]] of the covariance matrix, which can be expressed mathematically as :<math> \Sigma = (1-\lambda) \Sigma+\lambda I\,</math><!-- TeX --> where <math> I </math> is the identity matrix, and <math> \lambda </math> is the ''shrinkage intensity'' or ''regularisation parameter''. This leads to the framework of regularized discriminant analysis<ref name="Friedman:2001">{{cite journal |last=Friedman |first=J. H. |title=Regularized Discriminant Analysis |journal=[[Journal of the American Statistical Association]] |volume=84 |issue=405 |pages=165β175 |year=1989 |url=http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-4389.pdf |doi=10.2307/2289860 |jstor=2289860 |mr=0999675|citeseerx=10.1.1.382.2682 }}</ref> or shrinkage discriminant analysis.<ref>{{cite journal | last1 = AhdesmΓ€ki | first1 = M. | last2 = Strimmer | first2 = K. | year = 2010 | title = Feature selection in omics prediction problems using cat scores and false nondiscovery rate control | journal = Annals of Applied Statistics | volume = 4 | issue = 1| pages = 503β519 | doi=10.1214/09-aoas277| arxiv = 0903.2003 | s2cid = 2508935 }}</ref> Also, in many practical cases linear discriminants are not suitable. LDA and Fisher's discriminant can be extended for use in non-linear classification via the [[kernel trick]]. Here, the original observations are effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space. The most commonly used example of this is the [[Kernel Fisher discriminant analysis|kernel Fisher discriminant]]. LDA can be generalized to [[multiple discriminant analysis]], where ''c'' becomes a [[categorical variable]] with ''N'' possible states, instead of only two. Analogously, if the class-conditional densities <math>p(\vec x\mid c=i)</math> are normal with shared covariances, the [[sufficient statistic]] for <math>P(c\mid\vec x)</math> are the values of ''N'' projections, which are the [[Linear subspace|subspace]] spanned by the ''N'' means, [[affine transformation|affine projected]] by the inverse covariance matrix. These projections can be found by solving a [[Eigendecomposition of a matrix#Generalized eigenvalue problem|generalized eigenvalue problem]], where the numerator is the covariance matrix formed by treating the means as the samples, and the denominator is the shared covariance matrix. See β[[#Multiclass LDA|Multiclass LDA]]β above for details.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)